diff --git a/.github/workflows/mdlint.yml b/.github/workflows/mdlint.yml index 31f82de..5a26f2b 100644 --- a/.github/workflows/mdlint.yml +++ b/.github/workflows/mdlint.yml @@ -17,9 +17,13 @@ jobs: - name: Install mdlint run: npm install -g @moonbit/markdown-linter - - name: Install moonBit + - name: Install MoonBit run: /bin/bash -c "$(curl -fsSL https://cli.moonbitlang.com/ubuntu_x86_64_moon_setup.sh)" + - uses: DavidAnson/markdownlint-cli2-action@v16 + with: + globs: 'docs/*.md' + - name: Get changed files id: changed-files uses: tj-actions/changed-files@v43 diff --git a/docs/00-course-overview.md b/docs/00-course-overview.md index 2a3c793..af7ba3b 100644 --- a/docs/00-course-overview.md +++ b/docs/00-course-overview.md @@ -52,4 +52,4 @@ In this course, we will be using MoonBit as our programming language. The instal ## Acknowledgements -This course draws inspiration from [UPenn CIS 1200](https://www.seas.upenn.edu/~cis120/current/). \ No newline at end of file +This course draws inspiration from [UPenn CIS 1200](https://www.seas.upenn.edu/~cis120/current/). diff --git a/docs/01-program-design.md b/docs/01-program-design.md index 80c65f2..027a511 100644 --- a/docs/01-program-design.md +++ b/docs/01-program-design.md @@ -109,4 +109,4 @@ It is recommended to adopt a TDD workflow, namely, Modern software products are typically vast in scale, making TDD a reliable workflow for their development. By creating test cases in advance, developers can efficiently identify and rectify potential errors at an early stage, while also ensuring the seamless integration of new functions without disrupting existing ones. -Quiz: For some abnormal inputs, the sample program for the water bottles problem may fail. Can you identify them? (Hint: In MoonBit, the range of `Int` values is $-2^{31}$ to $2^{31} - 1$.) \ No newline at end of file +Quiz: For some abnormal inputs, the sample program for the water bottles problem may fail. Can you identify them? (Hint: In MoonBit, the range of `Int` values is $-2^{31}$ to $2^{31} - 1$.) diff --git a/docs/02-development-environments-expressions.md b/docs/02-development-environments-expressions.md index 05ec813..83fd844 100644 --- a/docs/02-development-environments-expressions.md +++ b/docs/02-development-environments-expressions.md @@ -67,7 +67,7 @@ In the above program, a top-level function and a test block are defined. In the Since this program does not generate any output, how exactly is it executed? -In order to write accurate programs, it is essential to understand how programs are executed. Therefore, it is necessary to establish a computational model that comprehends the process. MoonBit programs can be viewed using an expression-oriented programming approach. They are composed of expressions that represent values, and their execution involves reducing these expressions. +In order to write accurate programs, it is essential to understand how programs are executed. Therefore, it is necessary to establish a computational model that comprehends the process. MoonBit programs can be viewed using an expression-oriented programming approach. They are composed of expressions that represent values, and their execution involves reducing these expressions. In contrast, imperative programming consists of statements that may modify the program's state. For example, statements may include "create a variable named `x`", "assign `5` to `x`", or "let `y` point to `x`", etc. @@ -95,6 +95,7 @@ In static type systems, type checking is performed **before** the program is exe MoonBit has a static type system, where its compiler performs type checking before runtime. This approach aims to minimize the likelihood of encountering runtime errors stemming from the execution of operations on incompatible data types, such as attempting arithmetic calculations on Boolean values. By conducting type checking in advance, MoonBit strives to prevent program interruptions and ensure accurate outcomes. In MoonBit, each **identifier** can be associated with a unique type with a colon `:`. For example, + - `x: Int` - `a: Double` - `s: String` @@ -122,6 +123,7 @@ While this chapter will not explore the underlying implementation of data, such The first data type we will introduce here is the Boolean value, also known as a logical value. It is named after the mathematician George Boole, who is credited with inventing Boolean algebra. In MoonBit, the type for Boolean values is `Bool`, and it can only have two possible values: `true` and `false`. The following are three basic operations it supports: + - NOT: true becomes false, false becomes true. - Example: `not(true) == false` - AND: both must be true to be true. @@ -140,6 +142,7 @@ Quiz: How to define XOR (true if only one is true) using OR, AND, and NOT? In mathematics, the set of integers is denoted as $\mathbb{Z}$ and is considered a countably infinite set. However, in computer science, integers in programming languages typically have a limited range due to hardware constraints. In MoonBit, there are two integer types, each with a different range: + - Integer `Int`: ranging from $-2^{31}$ to $2^{31}-1$ - Long integer `Int64`: ranging from $-2^{63}$ to $2^{63}-1$ @@ -178,10 +181,12 @@ It is important to note that each character in MoonBit corresponds strictly to a #### Tuples Sometimes, it is necessary to represent data types that combine multiple pieces of information. For instance, a date can be represented by three numbers, and a person's personal information may include their name and age. In such cases, tuples can be used to combine data of different types with a fixed length. Tuples allow us to group together multiple values into a single entity. + - `(2023, 10, 24): (Int, Int, Int)` - `("Bob", 3): (String, Int)` We can access the data by using zero-based indexing. + - `(2023, 10, 24).0 == 2023` - `(2023, 10, 24).1 == 10` @@ -225,6 +230,7 @@ flowchart LR ``` We can denote the reduction of an $\texttt{}$ to a $\texttt{}$ as $\texttt{} \Rightarrow \texttt{}$. For example, + - $3 \Rightarrow 3$ (the reduction result of a value is itself) - $3 + 4 \Rightarrow 7$ - $2 * (4 + 5) \Rightarrow 18$ @@ -245,6 +251,7 @@ Therefore, $(2 + 3) * (5 - 2) \Rightarrow 15$. #### Variable Binding In MoonBit, variable binding can be achieved using the syntax `let : = `. It assigns an identifier to a value that is represented by an expression. In many cases, the type declaration is optional as the compiler can infer it based on the type of the expression. + - `let x = 10` - `let y = "String"` @@ -255,6 +262,7 @@ By utilizing variable binding effectively, you can avoid complex nesting of expr #### Expression Blocks and Scope In MoonBit, expression blocks can be defined using the syntax + ``` { Variable bindings @@ -279,6 +287,7 @@ It is important to note the direction of the arrows. On line 7, the `tmp` refers #### Expression Reduction under Variable Binding Expression reduction can be broken down into the following steps: + - Reduce the expression on the right-hand side of the variable binding. - **Replace** occurrences of identifiers with their reduction results. - Omit the variable binding part. @@ -411,4 +420,4 @@ In this chapter, we learned: - Integers and floating-point numbers - Characters and strings - Tuples -- How to view MoonBit programs in terms of expressions and values, and understand the execution of MoonBit programs by reduction. \ No newline at end of file +- How to view MoonBit programs in terms of expressions and values, and understand the execution of MoonBit programs by reduction. diff --git a/docs/03-functions-lists-recursion.md b/docs/03-functions-lists-recursion.md index 947180b..1172a04 100644 --- a/docs/03-functions-lists-recursion.md +++ b/docs/03-functions-lists-recursion.md @@ -58,7 +58,7 @@ fn one () -> Int { 1 } -fn add_char(ch: Char, str: String) -> String { +fn add_char(ch: Char, str: String) -> String { ch.to_string() + str } ``` @@ -70,6 +70,7 @@ This syntax enables you to use a function using the interface it provides, witho If a function is defined, it can be **applied** with `(, ...)`, e.g., `one()` and `add_char('m', "oonbit")`. When applying a function, it is important to ensure that the number of parameters and their types align with the function definition. That is, the order of parameters cannot be disrupted: ~~`add_char("oonbit", 'm')`~~. The evaluation of a function application follows the following steps: + - Evaluate the parameters **from left to right**. - Replace the occurrences of the parameters with their values. - Reduce the expressions in the function body. @@ -77,8 +78,8 @@ The evaluation of a function application follows the following steps: For example: ```moonbit expr -fn add_char(ch: Char, str: String) -> String { - ch.to_string() + str +fn add_char(ch: Char, str: String) -> String { + ch.to_string() + str } let moonbit: String = add_char(Char::from_int(109), "oonbit") @@ -112,10 +113,12 @@ In contrast with partial functions, the functions that do define an output for e To prevent program termination caused by forbidden operations and to distinguish between valid and invalid inputs, the `Option[T]` data type is employed. A value of type `Option[T]` falls into one of the two cases: + - `None`: the absence of a value. - `Some(value: T)`: the presence of a value of type `T`. For example, we can define a total function for integer division using the option type: + ```moonbit expr fn div(a: Int, b: Int) -> Option[Int] { if b == 0 { None } else { Some(a / b) } @@ -125,6 +128,7 @@ fn div(a: Int, b: Int) -> Option[Int] { If `b == 0`, we will return `None` instead of raising an error; otherwise, we will return the quotient wrapped by `Some`. In `Option[T]`, the notation `[T]` indicates that `Option` is a generic type, and the value it holds is of type `T`. For example: + - `Option[Int]`: it can either hold a value of type `Int` or it can be empty. We will explore how to extract the value from `Some` shortly. @@ -159,6 +163,7 @@ In MoonBit, function types have the following form: ``` For example: + - `() -> Int` - `(Int, String, Char) -> Int` - `((Int, Int, Int)) -> (Int, Int, Int)` accepts a tuple and returns a tuple @@ -183,7 +188,7 @@ fn init { } ``` -By using labeled arguments, the order of the parameters becomes less important. In addition, they can be made optional by specifying a default value when declaring them. When the function is called, if no argument is explicitly provided, the default value will be used. +By using labeled arguments, the order of the parameters becomes less important. In addition, they can be made optional by specifying a default value when declaring them. When the function is called, if no argument is explicitly provided, the default value will be used. Consider the following example: @@ -206,6 +211,7 @@ It is important to note that the default value expression will be evaluated each ## Lists Data is everywhere. Sometimes, we have data with the following characteristics: + - The data is ordered. - The data can be duplicated. - The data can vary in length. @@ -217,6 +223,7 @@ For instance, let's consider the organization of our natural language snippets. Here, we will define a single-ended immutable integer list called `IntList`, where items can only be inserted at one and only one end, known as the head. Let's recall the workflow introduced in Chapter 1. After understanding the problem, we should define the interfaces, i.e., the operations that should be supported: + - Construction - `nil : () -> IntList`: construct an empty list - `cons : (Int, IntList) -> IntList`: add a new item into the list @@ -292,13 +299,14 @@ block-beta ``` The following examples help deepen our understanding of lists. + - The following are valid lists: - - `let int_list: List[Int] = Cons(1, Cons(2, Cons(3, Nil)))` - - `let string_list: List[String] = Cons("This", Cons("is", Cons("a", Cons("sentence.", Nil))))` + - `let int_list: List[Int] = Cons(1, Cons(2, Cons(3, Nil)))` + - `let string_list: List[String] = Cons("This", Cons("is", Cons("a", Cons("sentence.", Nil))))` - The following are not valid lists: - - `Cons(1, Cons(true, Cons(3, Nil)))`: Items are of different types. - - `Cons(1, 2)`: `2` itself is not a list. - - `Cons(1, Cons(Nil, Nil))`: Items are of different types. + - `Cons(1, Cons(true, Cons(3, Nil)))`: Items are of different types. + - `Cons(1, 2)`: `2` itself is not a list. + - `Cons(1, Cons(Nil, Nil))`: Items are of different types. Like `Option[T]`, the list type `List[T]` is also generic. @@ -333,6 +341,7 @@ In the above example, to access the first item of a list of integers, we use pat #### Reduction of Pattern Matching Expressions The reduction of pattern matching expressions follows the following steps: + - Reduce the expression to be matched. - Try the patterns in a sequential order until a successful match is found. - Replace the identifiers in the matched case with their corresponding values. @@ -352,16 +361,20 @@ let first_elem: Option[Int] = head_opt(Cons(1, Cons(2, Nil))) ```moonbit expr head_opt(Cons(1, Cons(2, Nil))) ``` + $\mapsto$ (Replace the identifiers in the function body.) + ```moonbit expr -match List::Cons(1, Cons(2, Nil)) { +match List::Cons(1, Cons(2, Nil)) { Nil => Option::None Cons(head, tail) => Option::Some(head) } ``` + $\mapsto$ `Some(1)` (Perform pattern matching and replace the identifiers in the matched case.) The last step of reduction is equivalent to: + ```moonbit expr { let head = 1 @@ -384,6 +397,7 @@ fn get_or_else(option_int: Option[Int64], default: Int64) -> Int64 { ``` If we believe that the expression to be matched will not be `None`, we can write a partial function that omits the `None` pattern. + ```moonbit expr fn get(option_int: Option[Int64]) -> Int64 { match option_int { // Warning: Partial match @@ -396,6 +410,7 @@ fn get(option_int: Option[Int64]) -> Int64 { ## Recursion The following are examples of recursion: + - **G**NU is **N**ot **U**nix - **W**ine **I**s **N**ot an **E**mulator - Fibonacci sequence: each number is the sum of the two preceding ones. @@ -408,6 +423,7 @@ The following are examples of recursion: Recursion is the process of breaking down a problem into smaller subproblems that are similar to the original problem but of a reduced scale. For a function, recursion is the process of calling itself either directly or indirectly. It is crucial to ensure that a recursive function has at least one base case, otherwise, it will continue to run endlessly, much like the captain's stories. Here is an example of a recursive function: + ```moonbit fn fib(n: Int) -> Int { if n == 1 || n == 2 { 1 } else { fib (n-1) + fib (n-2) } @@ -431,7 +447,7 @@ Recursion can also be utilized to determine the parity of a natural number. If a Lists are defined in a recursive manner: they can be either an empty list or a combination of an item and a sublist. As a result, lists can be manipulated using recursive functions and pattern matching. -```moonbit +```moonbit fn length(list: List[Int]) -> Int { match list { Nil => 0 @@ -449,28 +465,37 @@ The evaluation of recursive functions follows a similar process to that of non-r ```moonbit expr length(List::Cons(1, Cons(2, Nil))) ``` + $\mapsto$ (Replace the identifiers in the function body.) + ```moonbit expr match List::Cons(1, Cons(2, Nil)) { Nil => 0 Cons(_, tl) => 1 + length(tl) // tl = Cons(2, Nil) } ``` + $\mapsto$ (Perform pattern matching and replace the identifiers in the matched case.) + ```moonbit expr 1 + length(List::Cons(2, Nil)) ``` + $\mapsto$ (Again, replace the identifiers in the function body.) + ```moonbit expr 1 + match List::Cons(2, Nil) { Nil => 0 Cons(_, tl) => 1 + length(tl) // tl = Nil } ``` + $\mapsto$ (Perform pattern matching and replace the identifiers in the matched case.) + ```moonbit expr 1 + 1 + length(Nil) ``` + ... $\mapsto$ `1 + 1 + 0` $\mapsto$ `2` @@ -492,6 +517,7 @@ fn tail(list: List[Int]) -> List[Int] { ``` Proof: + - If `a` is in the pattern of `Nil`, then the sublist `tail(a) == a`, and both have a length of $0$. Hence, the proposition holds. - If `a` is in the pattern of `Cons(head, tail)`, then the sublist `tail(Cons(head, tail)) == tail`. Since $l_1 = l_2 + 1 > l_2$, the proposition holds. - By mathematical induction, the original proposition is proven to be true. @@ -546,18 +572,21 @@ This performance is obviously unacceptable. Therefore, we can use the technique Dynamic programming (DP) refers to the algorithmic paradigm that solves a complex problem by decomposing it into smaller subproblems that are similar to the original problem but of a reduced scale. It is an optimization over naïve recursion. DP is applicable to optimization problems that have: + - **Overlapping subproblems**: DP solves each subproblem once and caches the result, avoiding redundant computations. - **Optimal substructure**: The global solution can be built from subproblems. Specifically, in cases where non-optimization problems exhibit a recursive substructure and their solutions are uniquely determined by this recursive form, the only solution can be regarded as the optimal solution. Therefore, dynamic programming algorithms can also be applied to these non-optimization problems, such as the Fibonacci sequence. DP algorithms can be implemented top-down or bottom-up: + - **Top-down**: For each subproblem, if it has already been solved, use the cached result; otherwise, solve it and cache the result. - **Bottom-up**: Solve the subproblems first, then calculate the solutions of larger subproblems from the smaller ones. ### Solving Fibonacci Sequence with DP DP is applicable to the problem of Fibonacci sequence, which has: + - Overlapping subproblems: Both $F_{n + 1}$ and $F_{n + 2}$ require $F_n$. - Recursive substructure: $F_n$ is determined by $F_{n - 1}$ and $F_{n - 2}$. @@ -629,7 +658,7 @@ fn fib1_mut(num: Int) -> Int64 { let result_1 = aux(num - 1) let result_2 = aux(num - 2) // Update the binding with = - map = put(map, num, result_1 + result_2) + map = put(map, num, result_1 + result_2) result_1 + result_2 } } @@ -645,9 +674,9 @@ In the bottom-up implementation, we typically start from the smallest subproblem ```moonbit expr fn fib2(num: Int) -> Int64 { fn aux(n: Int, map: @immut/sorted_map.T[Int, Int64]) -> Int64 { - let result = get_or_else(get(map, n - 1), 1L) + + let result = get_or_else(get(map, n - 1), 1L) + get_or_else(get(map, n - 2), 1L) - if n == num { result } + if n == num { result } else { aux(n + 1, put(map, n, result)) } } let map = put(put(make(), 0, 0L), 1, 1L) @@ -680,6 +709,7 @@ flowchart LR ## Summary In this chapter we learned + - Basic data type: functions and their operations - Data structure: lists and pattern matching on lists - Algorithm: recursion and dynamic programming @@ -687,18 +717,22 @@ In this chapter we learned We also get an informal idea of ​​computational complexity. This course is intended to be an introductory level introduction. For a deeper understanding of how to use mathematical induction to prove the correctness of structured recursion, please refer to: + - _**Software Foundations, Volume 1: Logical Foundations**_: Basics, Induction & Lists; or - _**Programming Language Foundations in Agda**_: Naturals, Induction & Relations If you want to learn more about algorithms and computational complexity, please refer to: + - _**Algorithms** (4e)_: Chapter 1 - Fundamentals; or - _**Introduction to Algorithms** (4e)_: Chapter 3 - Characterizing Running Times; or - _**Introduction to Algorithms** (3e)_: Chapter 3 - Growth of Functions To learn more about dynamic programming, please refer to: + - _**Introduction to Algorithms** (4e)_: Chapter 14 - Dynamic Programming; or - _**Introduction to Algorithms** (3e)_: Chapter 15 - Dynamic Programming Reference code: + - [Functions, Lists & Recursion](https://try.moonbitlang.com/examples/course/lec3/function_list_recursion.mbt) -- [Dynamic Programming](https://try.moonbitlang.com/examples/course/lec3/dynamic_programming.mbt) \ No newline at end of file +- [Dynamic Programming](https://try.moonbitlang.com/examples/course/lec3/dynamic_programming.mbt) diff --git a/docs/04-tuples-structs-enums.md b/docs/04-tuples-structs-enums.md index 74de5eb..e7ff740 100644 --- a/docs/04-tuples-structs-enums.md +++ b/docs/04-tuples-structs-enums.md @@ -35,9 +35,11 @@ Structures allow us to assign **names** to the data, both to the entire type and - ```moonbit struct PersonalInfo { name: String; age: Int } ``` + - ```moonbit struct ContactInfo { name: String; telephone: Int } ``` + - ```moonbit struct AddressInfo { address: String; postal: Int } ``` @@ -83,7 +85,7 @@ fn g(pair: (String, Int)) -> PersonalInfo { { name: pair.0, age: pair.1, }} Feel free to verify this. Similarly, `PersonalInfo` is isomorphic to `(Int, String)`. You can try defining the corresponding mappings yourself. -The key difference between tuples and structures lies in their compatibility. Tuples are *structural*, meaning they are compatible as long as the structure is the same – each field type corresponds one-to-one. For example, a function successfully accepts a tuple here. +The key difference between tuples and structures lies in their compatibility. Tuples are *structural*, meaning they are compatible as long as the structure is the same – each field type corresponds one-to-one. For example, a function successfully accepts a tuple here. ```moonbit fn accept(tuple: (Int, String)) -> Bool { @@ -124,8 +126,7 @@ fn get_or_else(option_int: Option[Int], default: Int) -> Int { } ``` -We have previously used pattern matching to inspect the structure of `List` and `Option`. For instance, using `Nil` and `Cons` to match lists; `None` and `Some` to match Options. In fact, pattern matching can match values (booleans, numbers, characters, strings) as well as constructors. - +We have previously used pattern matching to inspect the structure of `List` and `Option`. For instance, using `Nil` and `Cons` to match lists; `None` and `Some` to match Options. In fact, pattern matching can match values (booleans, numbers, characters, strings) as well as constructors. ```moonbit fn is_zero(i: Int) -> Bool { @@ -151,7 +152,7 @@ fn contains_zero(l: @immut/list.T[Int]) -> Bool { In this example, the branch `Cons(0, _)` matches lists starting with `0`. The branch `Cons(_, tl)` matches other lists, while binding the sublist to the identifier `tl` for further processing. The head of the current list is discarded by the wildcard. -Pattern matching for tuples and structures is just like for constructions. +Pattern matching for tuples and structures is just like for constructions. ```moonbit fn first(pair: (Int, Int)) -> Int { @@ -170,9 +171,9 @@ fn baby_name(info: PersonalInfo) -> Option[String] { Tuples' patterns are just like their definitions, enclosed in parentheses and separated by commas. Make sure the length of the matched tuple is correct. Structure patterns are enclosed in braces and separated by commas. We have additional pattern forms to make pattern matching more flexible: -* Explicitly match some specific values, such as `age: 0` to match the data with specific values. -* Use another identifier to bind a field, such as `age: my_age`. This is useful when you don't want to use the field name as an identifier. -* Omit remaining fields with `..` at the end. +- Explicitly match some specific values, such as `age: 0` to match the data with specific values. +- Use another identifier to bind a field, such as `age: my_age`. This is useful when you don't want to use the field name as an identifier. +- Omit remaining fields with `..` at the end. Here is another example for better understanding how to use nested patterns. The `zip` function combines two lists into a new list of pairs like a zipper. The length of the resulting list is the minimum of the lengths of the input lists. Given the lists `[1, 2, 3]` and `['a', 'b', 'c', 'd']`, the zipped list would be `[(1, 'a'), (2, 'b'), (3, 'c')]`. @@ -191,7 +192,7 @@ Lastly, pattern matching is not limited to `match`; it can also be used in data ```moonbit no-check let ok_one = Result::Ok(1); -let Result::Ok(one) = ok_one; +let Result::Ok(one) = ok_one; let Result::Err(e) = ok_one; // Runtime error ``` @@ -222,7 +223,7 @@ enum { ; } Here, each possible variant is a constructor. For instance, `let monday = Monday`, where `Monday` defines the day of the week as Monday. Different enumerated types may cause conflicts because they might use the same names for some cases. In such cases, we distinguish them by adding `::` in front of the constructor, such as `DaysOfWeek::Monday`. -Now we need to ask, why do we need enumerated types? Why not just use numbers from one to seven to represent Monday to Sunday? Let's compare the following two functions. +Now we need to ask, why do we need enumerated types? Why not just use numbers from one to seven to represent Monday to Sunday? Let's compare the following two functions. ```moonbit no-check fn tomorrow(today: Int) -> Int @@ -234,7 +235,7 @@ The most significant difference is that functions defined with enumerated types Additionally, enumerated types prevent the representation of irrational data. For instance, when using various services, user identification can be based on either a phone number or an email, both of which are optional but only one is required. If we use a structure with two nullable fields to represent this, there is a risk of both fields being empty or both having data, which is not what we want. Therefore, enumerated types can be used to better restrict the range of reasonable data. -Each variant of an enumerated type can also carry data. For instance, we've seen the enumerated type `Option`. +Each variant of an enumerated type can also carry data. For instance, we've seen the enumerated type `Option`. ```moonbit no-check enum Option[T] { @@ -297,9 +298,9 @@ We've mentioned product types and sum types. Now, let me briefly introduce algeb The terms tuple, structure, and enumerated type, which we discussed earlier, are collectively referred to as algebraic data types. They are called algebraic data types because they construct types through algebraic operations, specifically "sum" and "product", and they exhibit algebraic structures. Recall the properties of regular numbers, such as equality, addition, multiplication, and the facts such that any number multiplied by 1 equals itself, any number plus 0 equals itself, etc. Similarly, algebraic data types exhibit properties such as: -* type equality implying isomorphism -* type multiplication forming product types (tuples or structures) -* type addition forming sum types (enumerated types) +- type equality implying isomorphism +- type multiplication forming product types (tuples or structures) +- type addition forming sum types (enumerated types) Here, **Zero** is a type that corresponds to an **empty type**. We can define an empty enumerated type without any cases; such a type has no constructors, and no values can be constructed, making it empty. **One** corresponds to a type with only one element, which we call the **Unit type**, and its value is a zero-tuple. @@ -310,7 +311,7 @@ fn f[T](t: T) -> (T, Unit) { (t, ()) } fn g[T](pair: (T, Unit)) -> T { pair.0 } ``` -In this context, a type `T` multiplied by $1$ implies that `(T, Unit)` is isomorphic to `T`. We can establish a set of mappings: it's straightforward to go from `T` to `(T, Unit)` by simply adding the zero-tuple. Conversely, going from `(T, Unit)` to `T` involves ignoring the zero-tuple. You can intuitively find that they are isomorphic. +In this context, a type `T` multiplied by $1$ implies that `(T, Unit)` is isomorphic to `T`. We can establish a set of mappings: it's straightforward to go from `T` to `(T, Unit)` by simply adding the zero-tuple. Conversely, going from `(T, Unit)` to `T` involves ignoring the zero-tuple. You can intuitively find that they are isomorphic. ```moonbit enum Nothing {} @@ -329,7 +330,7 @@ fn g[T](t: T) -> PlusZero[T] { CaseT(t) } The property of any type plus zero equals itself means that, for any type, we define an enumerated type `PlusZero`. One case contains a value of type `T`, and the other case contains a value of type `Nothing`. This type is isomorphic to `T`, and we can construct a set of mappings. Starting with `PlusZero`, we use pattern matching to discuss the cases. If the included value is of type `T`, we map it directly to `T`. If the type is `Nothing`, this case will never happen because there are no values of type `Nothing`, so we use `abort` to handle, indicating that the program will terminate. Conversely, we only need to wrap `T` with `CaseT`. It's essential to emphasize that this introduction is quite basic, providing an intuitive feel. Explore further if you are interested. -Here are a few examples. +Here are a few examples. ```moonbit enum Coins { Head; Tail } @@ -343,13 +344,13 @@ enum DaysOfWeek { Monday; Tuesday; ...; } $\texttt{DaysOfWeek} = 1 + 1 + 1 + 1 + 1 + 1 + 1 = 7$ -The data type for the coin toss can be considered as $1 + 1$, as each case, `Head` and `Tail`, actually represents a set with only one value. Therefore, each case is isomorphic to the Unit type. When combined by the sum type, the `Coin` type becomes $1 + 1 = 2$, representing a set with two values, which is isomorphic to any other type with two values. Similarly, `DaysOfWeek` represents a set of seven values, isomorphic to any other type with seven values. +The data type for the coin toss can be considered as $1 + 1$, as each case, `Head` and `Tail`, actually represents a set with only one value. Therefore, each case is isomorphic to the Unit type. When combined by the sum type, the `Coin` type becomes $1 + 1 = 2$, representing a set with two values, which is isomorphic to any other type with two values. Similarly, `DaysOfWeek` represents a set of seven values, isomorphic to any other type with seven values. -A more interesting example is `List`, using `List[Int]` as an example. +A more interesting example is `List`, using `List[Int]` as an example. $$ \begin{array}{rl} -\texttt{enum} \ \ \texttt{List} & = \texttt{Nil} + \texttt{Int} \times \texttt{List} \\ +\texttt{enum} \ \ \texttt{List} & = \texttt{Nil} + \texttt{Int} \times \texttt{List} \\ & = \texttt{1} + \texttt{Int} \times \texttt{List} \\ & = \texttt{1} + \texttt{Int} \times (\texttt{1} + \texttt{Int} \times \texttt{List} ) \\ & = \texttt{1} + \texttt{Int} \times \texttt{1} + \texttt{Int} \times \texttt{Int} \times \texttt{List} \\ @@ -366,9 +367,9 @@ In this chapter, we explored various custom data types in MoonBit, including: - **Tuples:** Fixed-length combinations of different data types. - **Structures:** Tuples with names to fields for better understanding. - **Enumerated Types:** Types that represent a distinct set of values, often used to model different cases or options. - + We also touched upon the concept of algebraic data types, which encompass tuples, structures, and enumerated types, and discussed some basic properties resembling those found in algebra. For further exploration, please refer to: -- _**Category Theory for Programmers**_: [Chapter 6 - Simple Algebraic Data Types](https://bartoszmilewski.com/2015/01/13/simple-algebraic-data-types/) \ No newline at end of file +- ***Category Theory for Programmers***: [Chapter 6 - Simple Algebraic Data Types](https://bartoszmilewski.com/2015/01/13/simple-algebraic-data-types/) diff --git a/docs/05-trees.md b/docs/05-trees.md index ee83c5a..f3c7d11 100644 --- a/docs/05-trees.md +++ b/docs/05-trees.md @@ -2,7 +2,7 @@ In this chapter, we explore a common data structure: trees, and related algorithms. We will start with a simple tree, understand the concept, and then learn about a specialized tree: binary tree. After that, we will explore a specialized binary tree: binary search tree. Further, we will also learn about the balanced binary tree. -Trees are very common plants in our lives, as shown in the diagram. +Trees are very common plants in our lives, as shown in the diagram. ![trees](/pics/trees.drawio.webp) @@ -18,7 +18,7 @@ If a tree is not empty, it should have exactly one root node, which has only chi In a tree, an edge refers to a pair of nodes $(u, v)$, where either $u$ is the parent node of $v$ or $v$ is the parent node of $u$; simply put, these two nodes should have a parent-child relationship. We use arrows in diagrams to indicate parent-child relationships, with the arrow pointing from an ancestor to its descendant. -The example below is not a tree. +The example below is not a tree. ![](/pics/not-a-tree-en.drawio.webp) @@ -26,25 +26,25 @@ Each red mark violates the requirements of a tree. In the upper right, there's a It's common to place the root node at the top, with child nodes arranged below their parent nodes. We have some terms related to trees. Firstly, the depth of a node corresponds to the length of the path from the root node down to that node. In other words, the number of edges traversed when going from the root node downwards. Therefore, the depth of the root is $0$. Then the height of a node corresponds to the length of the longest path from the node to a leaf node. Likewise, the height of a leaf node is $0$. Finally, there's the height of a tree, which is equivalent to the height of the root node. If a tree has only one node, it is both the root node and a leaf node, with a height of $0$. If a tree is empty, meaning it has no nodes, we define its height as $-1$. However, some books may define it differently, considering the layers of the tree, with the root being the first layer, and so on. -Having discussed the logical structure of a tree, let's now consider its storage structure. While the logical structure defines the relationships between data, the storage structure defines the specific representation of data. We'll use a binary tree as an example, where each node has at most two children. Here, we'll represent the tree using a list of tuples. Each tuple defines a parent-child relationship, such as `(0, 1)`, indicating that node $0$ is the parent of node $1$. +Having discussed the logical structure of a tree, let's now consider its storage structure. While the logical structure defines the relationships between data, the storage structure defines the specific representation of data. We'll use a binary tree as an example, where each node has at most two children. Here, we'll represent the tree using a list of tuples. Each tuple defines a parent-child relationship, such as `(0, 1)`, indicating that node $0$ is the parent of node $1$. Another way is to use algebraic data structures we have talked about previously: ```moonbit no-check -Node(0, - Node(1, - Leaf(3), - Empty), +Node(0, + Node(1, + Leaf(3), + Empty), Leaf(2)) ``` -We define several cases using an enumeration type: `Node` represents a regular tree node with its own number and two subtrees, `Leaf` represents a tree with only one node, i.e., a leaf node, having only its own number, and `Empty` represents an empty tree. With this representation, we can define a tree structure similar to before. Of course, this is just one possible implementation. +We define several cases using an enumeration type: `Node` represents a regular tree node with its own number and two subtrees, `Leaf` represents a tree with only one node, i.e., a leaf node, having only its own number, and `Empty` represents an empty tree. With this representation, we can define a tree structure similar to before. Of course, this is just one possible implementation. The final approach is a list where each level's structure is arranged consecutively from left to right: ![](/pics/list-tree.drawio.webp) -For example, the root node is placed at the beginning of the list, followed by nodes of the second level from left to right, then nodes of the third level from left to right, and so on. Thus, node $3$ and the node to its right are children of node $1$, while the two nodes after are children of $2$. These three nodes are all empty in the example. +For example, the root node is placed at the beginning of the list, followed by nodes of the second level from left to right, then nodes of the third level from left to right, and so on. Thus, node $3$ and the node to its right are children of node $1$, while the two nodes after are children of $2$. These three nodes are all empty in the example. We can see that all three methods define the same tree, but their storage structures are quite different. Hence, we can conclude that the logical structure of data is independent of its storage structure. @@ -61,7 +61,7 @@ enum IntTree { } ``` -The first algorithm we will discuss is binary tree traversal (or search). Tree traversal refers to the process of visiting all nodes of a tree in a certain order without repetition. Typically, there are two methods of traversal: depth-first and breadth-first. Depth-first traversal always visits one subtree before the other. During the traversal of a subtree, it recursively visits one of its subtrees. Thus, it always reaches the deepest nodes first before returning. For example, +The first algorithm we will discuss is binary tree traversal (or search). Tree traversal refers to the process of visiting all nodes of a tree in a certain order without repetition. Typically, there are two methods of traversal: depth-first and breadth-first. Depth-first traversal always visits one subtree before the other. During the traversal of a subtree, it recursively visits one of its subtrees. Thus, it always reaches the deepest nodes first before returning. For example, ![](/pics/traversal-en.drawio.webp) @@ -69,11 +69,11 @@ In the diagram, we first visit the left subtree, then the left subtree again, le Depth-first traversal usually involves three variations: preorder traversal, inorder traversal, and postorder traversal. The difference lies in when the root node is visited while traversing the entire tree. For example, in preorder traversal, the root node is visited first, followed by the left subtree, and then the right subtree. Taking the tree we just saw as an example, this means we start with $0$, then visit the left subtree; when visiting the left subtree, we start from the root node again, which is $1$; then $3$, $4$, $5$, and $2$. In inorder traversal, the left subtree is visited first, followed by the root node, and then the right subtree. Hence, it first visits the left subtree. At this moment, there is still a left subtree, so we go down to tree $3$. Now, it's a leaf node without a left subtree, so we visit the root node $3$. Then we return to visit the root node $1$ of the subtree and proceed to visit the right subtree. Postorder traversal follows a similar logic, visiting the left subtree first, then the right subtree, and finally the root node. In fact, solving the Fibonacci sequence can be seen as a postorder traversal, as we first visit the $(n-1)$-th and $(n-2)$-th items, which are two subtrees, and then solve the $n$-th item, which is the value of the root node. As for breadth-first traversal, we have already explained it: from left to right, the order is `[0, 1, 2, 3, 4, 5]`. -Let's take a look at the specific implementation of these two traversals in terms of finding a specific value in the tree's nodes. Firstly, let's consider depth-first traversal. +Let's take a look at the specific implementation of these two traversals in terms of finding a specific value in the tree's nodes. Firstly, let's consider depth-first traversal. ```moonbit fn dfs_search(target: Int, tree: IntTree) -> Bool { - match tree { // check the visited tree + match tree { // check the visited tree Empty => false // empty tree implies we are getting deepest Node(value, left, right) => // otherwise, search in subtrees value == target || dfs_search(target, left) || dfs_search(target, right) @@ -85,7 +85,7 @@ As we introduced earlier, this is a traversal based on structural recursion. We ### Queues -Now let's continue with breadth-first traversal. +Now let's continue with breadth-first traversal. ![](/pics/bfs-en.drawio.webp) @@ -97,13 +97,13 @@ A queue is a first-in-first-out (FIFO) data structure. Each time, we dequeue a t Let's take a closer look. Just like lining up in real life, the person who enters the line first gets served first, so it's important to maintain the order of arrival. The insertion and deletion of data follow the same order, as shown in the diagram. We've added numbers from $0$ to $5$ in order. After adding $6$, it follows $5$; and if we delete from the queue, we start from the earliest added $0$. -The queue we're using here is defined by the following interface: +The queue we're using here is defined by the following interface: ```moonbit no-check fn empty[T]() -> Queue[T] // construct an empty queue fn enqueue[T](q: Queue[T], x: T) -> Queue[T] // add element to the tail // attempt to dequeue an element, return None if the queue is empty -fn pop[T](q: Queue[T]) -> (Option[T], Queue[T]) +fn pop[T](q: Queue[T]) -> (Option[T], Queue[T]) ``` -`empty`: construct an empty queue; `enqueue`: add an element to the queue, i.e., add it to the tail; `pop`: attempt to dequeue an element and return the remaining queue. If the queue is already empty, the returned value will be `None` along with an empty queue. For example, +`empty`: construct an empty queue; `enqueue`: add an element to the queue, i.e., add it to the tail; `pop`: attempt to dequeue an element and return the remaining queue. If the queue is already empty, the returned value will be `None` along with an empty queue. For example, ```moonbit no-check let q = enqueue(enqueue(empty(), 1), 2) @@ -198,7 +198,7 @@ assert(tail == enqueue(empty(), 2)) We've added $1$ and $2$ to an empty queue. Then, when we try to dequeue an element, we should get `Some(1)`, and what's left should be equivalent to adding $2$ to an empty queue. -Let's return to the implementation of our breadth-first traversal. +Let's return to the implementation of our breadth-first traversal. ```moonbit fn bfs_search(target: Int, queue: Queue[IntTree]) -> Bool { @@ -206,7 +206,7 @@ fn bfs_search(target: Int, queue: Queue[IntTree]) -> Bool { (None, _) => false // If the queue is empty, end the search (Some(head), tail) => match head { // If the queue is not empty, operate on the extracted tree Empty => bfs_search(target, tail) // If the tree is empty, operate on the remaining queue - Node(value, left, right) => + Node(value, left, right) => if value == target { true } else { // Otherwise, operate on the root node and add the subtrees to the queue bfs_search(target, enqueue(enqueue(tail, left), right)) @@ -222,13 +222,13 @@ So far, we've concluded our introduction to tree traversal. However, we may noti ## Binary Search Trees -Previously, we mentioned that searching for elements in a binary tree might require traversing the entire tree. For example, in the diagram below, we attempt to find the element $8$ in the tree. +Previously, we mentioned that searching for elements in a binary tree might require traversing the entire tree. For example, in the diagram below, we attempt to find the element $8$ in the tree. ![](/pics/bst-en.drawio.webp) For the left binary tree, we have to search the entire tree, ultimately concluding that $8$ is not in the tree. -To facilitate searching, we impose a rule on the arrangement of data in the tree based on the binary tree: from left to right, the data is arranged in ascending order. This gives rise to the binary search tree. According to the rule, all data in the left subtree should be less than the node's data, and the node's data should be less than the data in the right subtree, as shown in the diagram on the right. +To facilitate searching, we impose a rule on the arrangement of data in the tree based on the binary tree: from left to right, the data is arranged in ascending order. This gives rise to the binary search tree. According to the rule, all data in the left subtree should be less than the node's data, and the node's data should be less than the data in the right subtree, as shown in the diagram on the right. We notice that if we perform an inorder traversal, we can traverse the sorted data from smallest to largest. Searching on a binary search tree is very simple: determine whether the current value is less than, equal to, or greater than the value we are looking for, then we know which subtree should be searched further. In the example above, when we check if $8$ is in the tree, we find that $8$ is greater than $5$, so we should search on the right subtree next. When we encounter $7$, we find that there is no right subtree, meaning there are no numbers greater than $7$, so we conclude that $8$ is not in the tree. As you can see, our search efficiency has greatly improved. In fact, the maximum number of searches we need to perform, in the worst-case scenario, is the height of the tree plus one, rather than the total number of elements. In some cases, the height of the tree may also equal the total number of elements, as we will see later. @@ -243,16 +243,16 @@ fn insert(tree: IntTree, value: Int) -> IntTree { match tree { Empty => Node(value, Empty, Empty) // construct a new tree if it's empty Node(v, left, right) => // if not empty, update one subtree by insertion - if value == v { tree } else - if value < v { Node(v, insert(left, value), right) } else - { Node(v, left, insert(right, value)) } + if value == v { tree } else + if value < v { Node(v, insert(left, value), right) } else + { Node(v, left, insert(right, value)) } } } ``` Here we can see the complete insertion code. In line 3, if the original tree is empty, we reconstruct a new tree. In lines 6 and 7, if we need to update the subtree, we use the `Node` constructor to build a new tree based on the updated subtree. -Next, we discuss the delete operation. +Next, we discuss the delete operation. ![](/pics/bst-deletion-en.drawio.webp) @@ -262,7 +262,7 @@ Similarly, we do it with structural recursion. If the tree is empty, it's straig fn remove_largest(tree: IntTree) -> (IntTree, Int) { match tree { Node(v, left, Empty) => (left, v) - Node(v, left, right) => { + Node(v, left, right) => { let (newRight, value) = remove_largest(right) (Node(v, left, newRight), value) } } @@ -283,13 +283,13 @@ Here, we demonstrate part of the deletion of a binary search tree. We define a h ### Balanced Binary Trees -Finally, we delve into the balanced binary trees. When explaining binary search trees, we mentioned that the worst-case number of searches in a binary search tree depends on the height of the tree. Insertion and deletion on a binary search tree may cause the tree to become unbalanced, meaning that one subtree's height is significantly greater than the other's. For example, if we insert elements from $1$ to $5$ in sequence, we'll get a tree as shown in the lower left diagram. +Finally, we delve into the balanced binary trees. When explaining binary search trees, we mentioned that the worst-case number of searches in a binary search tree depends on the height of the tree. Insertion and deletion on a binary search tree may cause the tree to become unbalanced, meaning that one subtree's height is significantly greater than the other's. For example, if we insert elements from $1$ to $5$ in sequence, we'll get a tree as shown in the lower left diagram. ![](/pics/worst-bst-en.drawio.webp) We can see that for the entire tree, the height of the left subtree is $-1$ because it's an empty tree, while the height of the right subtree is $3$. In this case, the worst-case number of searches equals the number of elements in the tree, which is $5$. However, if the tree is more balanced, meaning the heights of the two subtrees are similar, as shown in the right diagram, the maximum depth of a node is at most $2$, which is approximately $\log_2n$ times, where $n$ is the number of elements in the tree. As you may recall from the curve of the logarithmic function, when the number of elements in the tree is large, there can be a significant difference in the worst-case search time between the two scenarios. Therefore, we hope to avoid this worst-case scenario to ensure that we always have good query performance. To achieve this, we can introduce a class of data structures called balanced binary trees, where the heights of any node's left and right subtrees are approximately equal. Common types of balanced binary trees include AVL trees, 2-3 trees, or red-black trees. Here, we'll discuss AVL trees, which are relatively simple. -The key to maintaining balance in a binary balanced tree is that when the tree becomes unbalanced, we can rearrange the tree to regain balance. The insertion and deletion of AVL trees are similar to standard binary search trees, except that AVL trees perform adjustments after each insertion or deletion to ensure the tree remains balanced. We add a height attribute to the node definition. +The key to maintaining balance in a binary balanced tree is that when the tree becomes unbalanced, we can rearrange the tree to regain balance. The insertion and deletion of AVL trees are similar to standard binary search trees, except that AVL trees perform adjustments after each insertion or deletion to ensure the tree remains balanced. We add a height attribute to the node definition. ```moonbit no-check enum AVLTree { @@ -311,9 +311,9 @@ After inserting or deleting an element, we traverse back from the modified locat fn balance(left: AVLTree, z: Int, right: AVLTree) -> AVLTree { if height(left) > height(right) + 1 { match left { - Node(y, left_l, left_r, _) => - if height(left_l) >= height(left_r) { - create(left_l, y, create(lr, z, right)) // x is on y and z's same side + Node(y, left_l, left_r, _) => + if height(left_l) >= height(left_r) { + create(left_l, y, create(lr, z, right)) // x is on y and z's same side } else { match left_r { Node(x, left_right_l, left_right_r, _) => // x is between y and z create(create(left_l, y, left_right_l), x, create(left_right_r, z, right)) @@ -323,7 +323,7 @@ fn balance(left: AVLTree, z: Int, right: AVLTree) -> AVLTree { } ``` -Here is a snippet of code for a balanced tree. You can easily complete the code once you understand what we just discussed. We first determine if a tree is unbalanced, by checking if the height difference between the two subtrees exceeds a specific value and which side is higher. After determining this, we perform a rebalancing operation. At this point, the root node we pass in is $z$, and the higher side after pattern matching is $y$. Then, based on the comparison of the heights of $y$'s two subtrees, we further determine whether $x$ is $y$ and $z$'s same side or between $y$ and $z$, as shown in line 6. Afterwards, we recombine based on the scenarios discussed earlier, as shown in lines 6 and 9. Taking insertion of an element as an example: +Here is a snippet of code for a balanced tree. You can easily complete the code once you understand what we just discussed. We first determine if a tree is unbalanced, by checking if the height difference between the two subtrees exceeds a specific value and which side is higher. After determining this, we perform a rebalancing operation. At this point, the root node we pass in is $z$, and the higher side after pattern matching is $y$. Then, based on the comparison of the heights of $y$'s two subtrees, we further determine whether $x$ is $y$ and $z$'s same side or between $y$ and $z$, as shown in line 6. Afterwards, we recombine based on the scenarios discussed earlier, as shown in lines 6 and 9. Taking insertion of an element as an example: ```moonbit no-check fn add(tree: AVLTree, value: Int) -> AVLTree { diff --git a/docs/06-generics-higher-order-functions.md b/docs/06-generics-higher-order-functions.md index e7b231a..3d9444f 100644 --- a/docs/06-generics-higher-order-functions.md +++ b/docs/06-generics-higher-order-functions.md @@ -8,7 +8,7 @@ Programming languages provide us with various means of abstraction, such as func Let's first look at the stack data structure to understand why and how we use generics. -A stack is a collection composed of a series of objects, where the insertion and removal of these objects follow the Last-In-First-Out (LIFO) principle. For example, consider the containers stacked on a ship as shown in the left-hand image below. +A stack is a collection composed of a series of objects, where the insertion and removal of these objects follow the Last-In-First-Out (LIFO) principle. For example, consider the containers stacked on a ship as shown in the left-hand image below. ![](/pics/stack-objects.drawio.webp) @@ -47,7 +47,7 @@ Returning to our code, we defined a recursive data structure based on stack oper The definition of a stack is very similar to that of a list. In fact, in MoonBit built-in library, lists are essentially stacks. -After defining a stack for integers, we might also want to define stacks for other types, such as a stack of strings. This is simple, and we only demonstrate the code here without explanation. +After defining a stack for integers, we might also want to define stacks for other types, such as a stack of strings. This is simple, and we only demonstrate the code here without explanation. ```moonbit enum StringStack { @@ -68,7 +68,7 @@ Indeed, the stack of strings looks exactly like the stack of integers, except fo ### Generics in MoonBit -Therefore, MoonBit provides an important language feature: generics. Generics are about taking types as parameters, allowing us to define more abstract and reusable data structures and functions. For example, with our stack, we can add a type parameter `T` after the name to indicate the actual data type stored. +Therefore, MoonBit provides an important language feature: generics. Generics are about taking types as parameters, allowing us to define more abstract and reusable data structures and functions. For example, with our stack, we can add a type parameter `T` after the name to indicate the actual data type stored. ```moonbit enum Stack[T] { @@ -89,16 +89,16 @@ Similarly, the functions defined later also have a `T` as a type parameter, repr ### Example: Generic Pair -We have already introduced the syntax, and we have more examples. +We have already introduced the syntax, and we have more examples. ```moonbit struct Pair[A, B]{ first: A; second: B } fn identity[A](value: A) -> A { value } ``` -For example, we can define a pair of data, or a tuple. The pair has two type parameters because we might have two elements of two different types. The stored values `first` and `second` are respectively of these two types. As another example, we define a function `identity` that can operate on any type and always return the input value. +For example, we can define a pair of data, or a tuple. The pair has two type parameters because we might have two elements of two different types. The stored values `first` and `second` are respectively of these two types. As another example, we define a function `identity` that can operate on any type and always return the input value. -`Stack` and `Pair` can themselves be considered as functions on types, with their parameters being `T` or `A, B`, and the results of the operation are specific types like `Stack[T]` and `Pair[A, B]`. `Stack` and `Pair` can be regarded as type constructors. In most cases, the type parameters in MoonBit can be inferred based on the specific parameter types. +`Stack` and `Pair` can themselves be considered as functions on types, with their parameters being `T` or `A, B`, and the results of the operation are specific types like `Stack[T]` and `Pair[A, B]`. `Stack` and `Pair` can be regarded as type constructors. In most cases, the type parameters in MoonBit can be inferred based on the specific parameter types. ![](/pics/polymorphism-type.webp) @@ -106,18 +106,18 @@ For example, in the screenshot here, the type of `empty` is initially unknown. B ### Example: Generic Functional Queue -Now let's look at another generic data structure: the queue. We have already used the queue in the breadth-first sorting in the last lesson. Recall, a queue is a First-In-First-Out data structure, just like we queue up in everyday life. Here we define the following operations, where the queue is called `Queue`, and it has a type parameter. +Now let's look at another generic data structure: the queue. We have already used the queue in the breadth-first sorting in the last lesson. Recall, a queue is a First-In-First-Out data structure, just like we queue up in everyday life. Here we define the following operations, where the queue is called `Queue`, and it has a type parameter. ```moonbit no-check fn empty[T]() -> Queue[T] // Create an empty queue fn push[T](q: Queue[T], x: T) -> Queue[T] // Add an element to the tail of the queue // Try to dequeue an element and return the remaining queue; if empty, return itself -fn pop[T](q: Queue[T]) -> (Option[T], Queue[T]) +fn pop[T](q: Queue[T]) -> (Option[T], Queue[T]) ``` Every operation has a type parameter, indicating the type of data it holds. We define three operations similar to those of a stack. The difference is that when removing elements, the element that was first added to the queue will be removed. -The implementation of the queue can be simulated by a list or a stack. We add elements at the end of the list, i.e., at the bottom of the stack, and take them from the front of the list, i.e., the top of the stack. The removal operation is very quick because it only requires one pattern matching. But adding elements requires rebuilding the entire list or stack. +The implementation of the queue can be simulated by a list or a stack. We add elements at the end of the list, i.e., at the bottom of the stack, and take them from the front of the list, i.e., the top of the stack. The removal operation is very quick because it only requires one pattern matching. But adding elements requires rebuilding the entire list or stack. ```moonbit no-check Cons(1, Cons(2, Nil)) => Cons(1, Cons(2, Cons(3, Nil))) @@ -125,7 +125,7 @@ Cons(1, Cons(2, Nil)) => Cons(1, Cons(2, Cons(3, Nil))) As shown here, to add an element at the end, i.e., to replace `Nil` with `Cons(3, Nil)`, we need to replace the whole `Cons(2, Nil)` with `Cons(2, Cons(3, Nil))`. And worse, the next step is to replace the `[2]` occurred as tail in the original list with `[2, 3]`, which means to rebuild the entire list from scratch. It is very inefficient. -To solve this problem, we use two stacks to simulate a queue. +To solve this problem, we use two stacks to simulate a queue. ```moonbit no-check struct Queue[T] { @@ -144,11 +144,11 @@ One stack is for the removal operation, and the other for storage. In the defini Let's look at a specific example. Initially, we have an empty queue, so both stacks are empty. After one addition, we add a number to `back`. Then we organize the queue and find that the queue is not empty, but `front` is empty, which does not meet our previously stated invariant, so we rotate the stack `back` and move rotated elements to `front`. Afterwards, we continue to add elements to `back`. Since `front` is not empty, it meets the invariant, and we do not need additional processing. -After that, our repeatedly additions are only the quick addition of new elements in `back`. Then, we remove elements from `front`. We check the invariant after the operation. We find that the queue is not empty, but `front` is empty, so we do retate `back` and move elements to `front` again. After that, we can normally take elements from `front`. +After that, our repeatedly additions are only the quick addition of new elements in `back`. Then, we remove elements from `front`. We check the invariant after the operation. We find that the queue is not empty, but `front` is empty, so we do retate `back` and move elements to `front` again. After that, we can normally take elements from `front`. You can see that one rotation supports multiple removal operations, therefore the overall cost is much less than rebuilding the list every time. -```moonbit +```moonbit struct Queue[T] { front: Stack[T] back: Stack[T] @@ -156,12 +156,12 @@ struct Queue[T] { fn Queue::empty[T]() -> Queue[T] { {front: Empty, back: Empty} } // Store element at the end of the queue -fn push[T](self: Queue[T], value: T) -> Queue[T] { +fn push[T](self: Queue[T], value: T) -> Queue[T] { normalize({ ..self, back: self.back.push(value)}) // By defining the first argument as self, we can use xxx.f() } // Remove the first element -fn pop[T](self: Queue[T]) -> (Option[T], Queue[T]) { +fn pop[T](self: Queue[T]) -> (Option[T], Queue[T]) { match self.front { Empty => (None, self) NonEmpty(top, rest) => (Some(top), normalize({ ..self, front: rest})) @@ -169,7 +169,7 @@ fn pop[T](self: Queue[T]) -> (Option[T], Queue[T]) { } // If front is empty, reverse back to front -fn normalize[T](self: Queue[T]) -> Queue[T] { +fn normalize[T](self: Queue[T]) -> Queue[T] { match self.front { Empty => { front: self.back.reverse(), back: Empty } _ => self @@ -177,7 +177,7 @@ fn normalize[T](self: Queue[T]) -> Queue[T] { } // Helper function: reverse the stack -fn reverse[T](self: Stack[T]) -> Stack[T] { +fn reverse[T](self: Stack[T]) -> Stack[T] { fn go(acc, xs: Stack[T]) { match xs { Empty => acc @@ -231,12 +231,12 @@ fn fold_right[A, B](list: @immut/list.T[A], f: (A, B) -> B, b: B) -> B { } ``` -Here’s another example. If we want to repeat a function’s operation, we could define `repeat` as shown in the first line. `repeat` accepts a function as a parameter and then returns a function as a result. Its operation results in a function that calculates the original function twice. +Here’s another example. If we want to repeat a function’s operation, we could define `repeat` as shown in the first line. `repeat` accepts a function as a parameter and then returns a function as a result. Its operation results in a function that calculates the original function twice. ```moonbit -fn repeat[A](f: (A) -> A) -> (A) -> A { +fn repeat[A](f: (A) -> A) -> (A) -> A { fn (a) { f(f(a)) } // Return a function as a result -} +} fn plus_one(i: Int) -> Int { i + 1 } fn plus_two(i: Int) -> Int { i + 2 } @@ -258,19 +258,19 @@ $\mapsto$ `fn (a) { plus_one(plus_one(a)) }`   `add_two(2)` -$\mapsto$ `plus_one(plus_one(2))` +$\mapsto$ `plus_one(plus_one(2))` -$\mapsto$ `plus_one(2) + 1` +$\mapsto$ `plus_one(2) + 1` -$\mapsto$ `(2 + 1) + 1` +$\mapsto$ `(2 + 1) + 1` -$\mapsto$ `3 + 1` +$\mapsto$ `3 + 1` $\mapsto$ `4` Let's explore the simplification here. First, `add_two` is bound to `repeat(plus_one)`. For this line, simplification is about to replace identifiers in expressions with arguments, obtaining a function as a result. Now, we cannot simplify further for this expression. Then, we Calculate `add_two(2)`. Similarly, we replace identifiers in the expression and simplify `plus_one`. After more simplifications, we finally obtain our result, `4`. -We've previously mentioned function types, which go from the accepted parameters to the output parameters, where the accepted parameters are enclosed in parentheses. +We've previously mentioned function types, which go from the accepted parameters to the output parameters, where the accepted parameters are enclosed in parentheses. - `(Int) -> Int` Integers to integers - `(Int) -> (Int) -> Int` Integers to a function that accepts integers and returns integers @@ -281,7 +281,7 @@ For example, the function type from integer to integer, would be `(Int) -> Int`. ### Example: Fold Functions -Here are a few more common applications of higher-order functions. Higher-order functions are functions that accept functions. `fold_right`, which we just saw, is a common example. Below, we draw its expression tree. +Here are a few more common applications of higher-order functions. Higher-order functions are functions that accept functions. `fold_right`, which we just saw, is a common example. Below, we draw its expression tree. ```moonbit no-check fn fold_right[A, B](list: @immut/list.T[A], f: (A, B) -> B, b: B) -> B { @@ -294,7 +294,7 @@ fn fold_right[A, B](list: @immut/list.T[A], f: (A, B) -> B, b: B) -> B { ![](/pics/fold_right.drawio.webp) -You can see that for a list from 1 to 3, `f` is applied to the current element and the result of the remaining elements each time, thus it looks like we're building a fold from right to left, one by one, to finally get a result. Therefore, this function is called `fold_right`. If we change the direction, folding the list from left to right, then we get `fold_left`. +You can see that for a list from 1 to 3, `f` is applied to the current element and the result of the remaining elements each time, thus it looks like we're building a fold from right to left, one by one, to finally get a result. Therefore, this function is called `fold_right`. If we change the direction, folding the list from left to right, then we get `fold_left`. ```moonbit fn fold_left[A, B](list: @immut/list.T[A], f: (B, A) -> B, b: B) -> B { @@ -311,7 +311,7 @@ Here, we only need to swap the order, first processing the current element with ### Example: Map Function -Another common application of higher-order functions is to map each element of a function. +Another common application of higher-order functions is to map each element of a function. ```moonbit no-check struct PersonalInfo { name: String; age: Int } @@ -325,9 +325,9 @@ let infos: @immut/list.T[PersonalInfo] = ??? let names: @immut/list.T[String] = infos.map(fn (info) { info.name }) ``` -For example, if we have some people's information and we only need their names, then we can use the mapping function `map`, which accepts `f` as a parameter, to map each element in the list one by one, finally obtaining a new list where the type of elements has become `B`. This function's implementation is very simple. What we need is also structural recursion. The last application is as shown in line 8. Maybe you feel like you've seen this `map` structure before: structural recursion, a default value for the empty case, and a binary operation processing the current value combined with the recursive result when not empty. Indeed, `map` can be entirely implemented using `fold_right`, where the default value is an empty list, and the binary operation is the `Cons` constructor. +For example, if we have some people's information and we only need their names, then we can use the mapping function `map`, which accepts `f` as a parameter, to map each element in the list one by one, finally obtaining a new list where the type of elements has become `B`. This function's implementation is very simple. What we need is also structural recursion. The last application is as shown in line 8. Maybe you feel like you've seen this `map` structure before: structural recursion, a default value for the empty case, and a binary operation processing the current value combined with the recursive result when not empty. Indeed, `map` can be entirely implemented using `fold_right`, where the default value is an empty list, and the binary operation is the `Cons` constructor. -```moonbit +```moonbit fn map[A, B](list: @immut/list.T[A], f: (A) -> B) -> @immut/list.T[B] { fold_right(list, fn (value, cumulator) { Cons(f(value), cumulator) }, Nil) } @@ -335,7 +335,7 @@ fn map[A, B](list: @immut/list.T[A], f: (A) -> B) -> @immut/list.T[B] { Here we leave you an exercise: how to implement `fold_left` with `fold_right`? Hint: something called `Continuation` may be involved. `Continuation` represents the remaining computation after the current operation, generally a function whose parameter is the current value and whose return value is the overall program's result. -Having learned about generics and higher-order functions, we can now define the binary search tree studied in the last lesson as a more general binary search tree, capable of storing various data types, not just integers. +Having learned about generics and higher-order functions, we can now define the binary search tree studied in the last lesson as a more general binary search tree, capable of storing various data types, not just integers. ```moonbit no-check enum Tree[T] { @@ -359,4 +359,4 @@ In this chapter, we introduced the concepts of generics and functions as first-c For further exploration, please refer to: - _**Software Foundations, Volume 1: Logical Foundations**_: Poly; or -- _**Programming Language Foundations in Agda**_: Lists \ No newline at end of file +- _**Programming Language Foundations in Agda**_: Lists diff --git a/docs/07-imperative-programming.md b/docs/07-imperative-programming.md index da7482c..c70daee 100644 --- a/docs/07-imperative-programming.md +++ b/docs/07-imperative-programming.md @@ -89,6 +89,7 @@ fn init { ``` The distinction between mutable and immutable data is important because it affects how we think about the data. As shown in the following diagrams, for mutable data, we can think of identifiers as boxes that hold values. + - In the first diagram, when we modify a mutable variable, we are essentially updating the value stored in the box. - In the second diagram, we use `let` to bind the identifier `ref` to a struct. Thus, the box contains a reference to the struct. When we modify the value in the struct using `ref`, we are updating the value stored in the struct which it points to. The reference itself does not change because it still points to the same struct. - In the third diagram, when we define a mutable `ref` and modify it, we are creating a new box and updating the reference to point to the new box. @@ -99,7 +100,6 @@ The distinction between mutable and immutable data is important because it affec Multiple identifiers pointing to the same mutable data structure can be considered aliases, which need to be handled carefully. - In the following example, the `alter` function takes two mutable references to `Ref` structs, `a` and `b`, and modifies the `val` field of `a` to `10` and the `val` field of `b` to `20`. When we call `alter(x, x)`, we are essentially passing the same mutable reference, `x`, twice. As a result, the `val` field of `x` will be changed twice, as both `a` and `b` are just aliases referring to the same `x` reference. ```moonbit @@ -257,4 +257,4 @@ fn fib_mut(n: Int) -> Int { ## Summary -In this chapter, we've explored the basics of imperative programming. We've learned about using commands to tell the computer what to do, variables to store values, and loops to repeat actions. Imperative programming is inherently different from functional programming, and it's important to understand the trade-offs between the two. By understanding these concepts, we can choose the right tools for the job and write programs that are both effective and easy to understand. \ No newline at end of file +In this chapter, we've explored the basics of imperative programming. We've learned about using commands to tell the computer what to do, variables to store values, and loops to repeat actions. Imperative programming is inherently different from functional programming, and it's important to understand the trade-offs between the two. By understanding these concepts, we can choose the right tools for the job and write programs that are both effective and easy to understand. diff --git a/docs/08-queues.md b/docs/08-queues.md index f2eacf0..98964d0 100644 --- a/docs/08-queues.md +++ b/docs/08-queues.md @@ -178,7 +178,7 @@ fn push[T](self: LinkedList[T], value: T) -> LinkedList[T] { } ``` -The following diagram is a simple demonstration. When we create a linked list by calling `make()`, both the `head` and `tail` are empty. When we push an element using `push(1)`, we create a new node and point both the `head` and `tail` to this node. When we push more elements, say `push(2)` and then `push(3)`, we need to update the `next` field of the current `tail` node to point to the new node. The `tail` node of the linked list should always point to the latest node. +The following diagram is a simple demonstration. When we create a linked list by calling `make()`, both the `head` and `tail` are empty. When we push an element using `push(1)`, we create a new node and point both the `head` and `tail` to this node. When we push more elements, say `push(2)` and then `push(3)`, we need to update the `next` field of the current `tail` node to point to the new node. The `tail` node of the linked list should always point to the latest node. ![](/pics/linked_list.drawio.webp) @@ -243,4 +243,4 @@ The optimized `length_` function uses tail recursion to calculate the length of ## Summary -This chapter covers the design and implementation of two basic types of queues, i.e., circular queues and singly linked lists. It also highlights the importance of understanding and utilizing tail calls and tail recursion to optimize recursive functions and prevent stack overflow, ultimately leading to more efficient and stable program performance. \ No newline at end of file +This chapter covers the design and implementation of two basic types of queues, i.e., circular queues and singly linked lists. It also highlights the importance of understanding and utilizing tail calls and tail recursion to optimize recursive functions and prevent stack overflow, ultimately leading to more efficient and stable program performance. diff --git a/docs/09-traits.md b/docs/09-traits.md index 57a0c95..d38a712 100644 --- a/docs/09-traits.md +++ b/docs/09-traits.md @@ -28,7 +28,6 @@ fn make[T]() -> Queue[T] { In fact, we have already encountered a similar situation in Chapter 6. When implementing a generic binary search tree, we need a comparison function to determine the order of values. As illustrated in the code below, we can pass the comparison function as a parameter. While this approach is effective, it can become cumbersome when dealing with more intricate type requirements. - ```moonbit no-check enum Tree[T] { Empty @@ -42,6 +41,7 @@ fn delete[T](self: Tree[T], value: T, compare: (T, T) -> Int) -> Tree[T] ``` The above examples share a common characteristic, that is, the functions are associated with the type `T`. For instance, we might require the following functions for `T`: + - Compare two `T` values: `fn T::compare(self: T, other: T) -> Int` - Get the default value of `T`: `fn T::default() -> T` - Get the string representation of a `T` value: `fn T::to_string(self: T) -> String` @@ -70,10 +70,10 @@ In generic functions, we use traits as bounds to specify what methods a type sup ```moonbit no-check fn make[T: Default]() -> Queue[T] { // `T` should support the `default` method. - { + { array: Array::make(5, T::default()), // The return type of `default` is `T`. - start: 0, end: 0, length: 0 - } + start: 0, end: 0, length: 0 + } } ``` @@ -87,8 +87,8 @@ Using the same approach, we can reimplement the `insert` method for `Tree`. In p fn insert[T : Compare](tree : Tree[T], value : T) -> Tree[T] { // Since `T` is bound by `Compare`, it should support the `compare` method. match tree { - Empty => Node(value, Empty, Empty) - Node(v, left, right) => + Empty => Node(value, Empty, Empty) + Node(v, left, right) => if T::compare(value, v) == 0 { // We can call `compare` here. tree } else if T::compare(value, v) < 0 { // We can call `compare` here. @@ -175,7 +175,7 @@ fn BoxedInt::plus_one(b: BoxedInt) -> BoxedInt { { value : b.value + 1 } } // `::` can be omitted when the first parameter is named `self`. -fn plus_two(self: BoxedInt) -> BoxedInt { +fn plus_two(self: BoxedInt) -> BoxedInt { { value : self.value + 2} } @@ -197,18 +197,25 @@ type MyMap[Key, Value] ``` A map should support the following methods: + - Create a map. + ```moonbit no-check fn make[Key, Value]() -> MyMap[Key, Value] ``` + - Add a key-value pair, or update the corresponding value of a key. + ```moonbit no-check fn put[Key, Value](map: MyMap[Key, Value], key: Key, value: Value) -> MyMap[Key, Value] ``` + - Get the corresponding value of a key. + ```moonbit no-check fn get[Key, Value](map: MyMap[Key, Value], key: Key) -> Option[Value] ``` + Since such a key-value pair may not exist in the map, the return value is wrapped in `Option`. The map can be implemented using a list of pairs. @@ -218,21 +225,26 @@ type MyMap[Key, Value] @immut/list.T[(Key, Value)] ``` The first two basic methods, `make` and `put`, can be easily implemented as follows: + - Create a map by creating an empty list. + ```moonbit - fn make[Key, Value]() -> MyMap[Key, Value] { + fn make[Key, Value]() -> MyMap[Key, Value] { MyMap(Nil) } ``` + - Add/update a key-value pair by inserting the pair to the beginning of the list. + ```moonbit - fn put[Key, Value](map: MyMap[Key, Value], key: Key, value: Value) -> MyMap[Key, Value] { + fn put[Key, Value](map: MyMap[Key, Value], key: Key, value: Value) -> MyMap[Key, Value] { let MyMap(original_map) = map MyMap( Cons( (key, value), original_map ) ) } ``` The third method, `get`, is also easy to describe in prose: + - Search the list from the beginning until the first matching key is found. In such an implementation of the `get` function, we need to compare the key we are searching for with the keys stored in the map to determine if they are equal. As a result, the `Key` type must implement the `Eq` trait. That is, we need to modify the previous declaration of `get` so that `Key` is bound by `Eq`. @@ -313,7 +325,8 @@ When matching, it is important to note that the key needs to be a literal value. ## Summary In this chapter, we learned how to + - Define traits and use them to bound type parameters - Implement traits implicitly or explicitly - Implement custom operators -- Implement a simple map using traits in MoonBit \ No newline at end of file +- Implement a simple map using traits in MoonBit diff --git a/docs/10-hash-maps-closures.md b/docs/10-hash-maps-closures.md index 8f2e8f5..fd3cb4e 100644 --- a/docs/10-hash-maps-closures.md +++ b/docs/10-hash-maps-closures.md @@ -4,13 +4,13 @@ A map or table is a collection of key-value pairs that bind keys to values, where each key is unique. A simple implementation of a map is a list of tuples, where each tuple is a key-value pair. We add a new key-value pair to the head of the list, and traverse from the head of the list for lookup operations. -Another implementation is based on the balanced binary tree (BBT) we introduced in [Chapter 5](./trees). We just need to modify the BBT so that each node now stores a key-value pair. In tree operations, we compare the first parameter of the key-value pair with the key we want to operate on. +Another implementation is based on the balanced binary tree (BBT) we introduced in [Chapter 5](./trees). We just need to modify the BBT so that each node now stores a key-value pair. In tree operations, we compare the first parameter of the key-value pair with the key we want to operate on. ## Hash Maps ### Hash Function -First, what is a hash function or hashing? A hash function maps or binds data of arbitrary length to data of fixed length. For example, you may have heard of the MD5 algorithm, which maps files of any size or format to a short 128-bit digest (compressed data representation). +First, what is a hash function or hashing? A hash function maps or binds data of arbitrary length to data of fixed length. For example, you may have heard of the MD5 algorithm, which maps files of any size or format to a short 128-bit digest (compressed data representation). For the `Hash` interface in MoonBit, data is mapped to values in the range of integers. For example, the string "ThisIsAVeryVeryLongString" will be mapped to the integer -321605584. @@ -30,11 +30,11 @@ a[ index ] = value // add or update data let value = a[ index ] // look up data ``` -Suppose we have an array of key-value pairs and want to add, update, or look up data. We first calculate the hash value based on the key. Since hash values can be any integer, we use modulo to map a hash value to an array index and then look up or update data with the corresponding array index. However, as mentioned earlier, this is the ideal scenario because hash collisions may occur. +Suppose we have an array of key-value pairs and want to add, update, or look up data. We first calculate the hash value based on the key. Since hash values can be any integer, we use modulo to map a hash value to an array index and then look up or update data with the corresponding array index. However, as mentioned earlier, this is the ideal scenario because hash collisions may occur. ## Hash Collision -According to the [pigeonhole principle](https://en.wikipedia.org/wiki/Pigeonhole_principle) or [birthday problem](https://en.wikipedia.org/wiki/Birthday_problem), the amount of data we map may exceed the range of integers, and the hash value may far exceed the valid array indices. For example, we obviously can't directly allocate an array with 2.1 billion slots, and then collisions will occur where multiple data have the same array index (different pieces of data may have the same hash value, and different hash values may be mapped to the same index in an array). There are several ways to handle hash collisions. +According to the [pigeonhole principle](https://en.wikipedia.org/wiki/Pigeonhole_principle) or [birthday problem](https://en.wikipedia.org/wiki/Birthday_problem), the amount of data we map may exceed the range of integers, and the hash value may far exceed the valid array indices. For example, we obviously can't directly allocate an array with 2.1 billion slots, and then collisions will occur where multiple data have the same array index (different pieces of data may have the same hash value, and different hash values may be mapped to the same index in an array). There are several ways to handle hash collisions. One approach is **direct addressing**. When data must be stored in the slot corresponding to the array index we calculated, different pieces of data might be stored in the same slot causing issues. So, we use another data structure in each slot to store items hashed to the same index. Possible data structures include lists, balanced binary trees, and the original array turns into an array of lists or trees. @@ -72,7 +72,7 @@ For the add/update operation, we first calculate the position to store the key b The following code demonstrates adding and updating data. We first calculate the hash value of the key at line 2 with the hash interface specified in `K : Hash` at line 1. Then we find and traverse the corresponding data structure. We're using a mutable data structure with an infinite while loop at line 4. We break out of the loop if we find the key already exists or reach the end of the list. If the key is found, we update the data in place. Otherwise, we update the bucket to be the remaining list so the loop terminates. When we reach the end of the list and haven't found the key, we add a new pair of data at the end of the list. At last, we check if it needs resizing based on the current load factor. ```moonbit -let load = 0.75 +let load = 0.75 fn resize() -> Unit {} // placeholder for resize implementation fn put[K : Hash + Eq, V](map : HT_bucket[K, V], key : K, value : V) -> Unit { @@ -144,7 +144,7 @@ struct HT_open[K, V] { For the add/update operation, we calculate the index to add/update data based on the hash value of the key. If the slot is not empty, we further check if it's the key we're looking for. If so, we update the value; otherwise, we keep probing backward and store the key-value pair once we find an empty slot. Here, we can assume an empty slot exists as we resize the array when needed. Note that the "backward" traversal here is the same as that in a circular queue. If the index exceeds the length of the array, we go back to the beginning of the array. -We can define a helper method to check if a key already exists. If so, we directly return its index; otherwise we return the index of the next empty slot. +We can define a helper method to check if a key already exists. If so, we directly return its index; otherwise we return the index of the next empty slot. ```moonbit no-check // Probe to the right of the index of the original hash, return the index of the first empty slot @@ -180,13 +180,13 @@ fn put[K : Hash + Eq + Default, V : Default](map : HT_open[K, V], key : K, value } ``` -The remove operation is more complicated. Recall that we have an invariant to maintain: there should be no empty slots between the original slot and the slot where the key-value pair is actually stored. As shown below, if we add 0, 1, 5, and 3 sequentially and then remove 1, there will be a gap between 0 and the position of 5 which violates the invariant and we won't be able to correctly look up 5. +The remove operation is more complicated. Recall that we have an invariant to maintain: there should be no empty slots between the original slot and the slot where the key-value pair is actually stored. As shown below, if we add 0, 1, 5, and 3 sequentially and then remove 1, there will be a gap between 0 and the position of 5 which violates the invariant and we won't be able to correctly look up 5. -A simple solution is to define a special state that marks a slot as "deleted" to ensure subsequent data can still be reached and found. Another solution is to check if any element from the slot of data removal to the next empty slot needs to move location so as to maintain the invariant. Here we demonstrate the simpler marking method, also known as "tombstone". +A simple solution is to define a special state that marks a slot as "deleted" to ensure subsequent data can still be reached and found. Another solution is to check if any element from the slot of data removal to the next empty slot needs to move location so as to maintain the invariant. Here we demonstrate the simpler marking method, also known as "tombstone". ![height:320px](/pics/open_address_delete_en.drawio.webp) -We define a new `Status` enum consisting of `Empty`, `Occupied` and `Deleted`, and update the type of the occupied array from boolean value to Status. +We define a new `Status` enum consisting of `Empty`, `Occupied` and `Deleted`, and update the type of the occupied array from boolean value to Status. ```moonbit enum Status { @@ -203,7 +203,7 @@ struct HT_open[K, V] { } ``` -Let's also update the helper function so that during key or empty slot lookup, we record the first empty slot that can be denoted by status `Empty` or `Deleted` to reuse the slot after data removal. However, we still need to find the next Empty slot to determine if the key does not exist. We use a simple variable named `empty` to record this. A negative value means we haven't found an empty slot yet, and we update the value to the index of the next empty slot if we find one. It also means we've encountered an empty slot if the loop ends, and then we decide what to return based on the variable `empty`. +Let's also update the helper function so that during key or empty slot lookup, we record the first empty slot that can be denoted by status `Empty` or `Deleted` to reuse the slot after data removal. However, we still need to find the next Empty slot to determine if the key does not exist. We use a simple variable named `empty` to record this. A negative value means we haven't found an empty slot yet, and we update the value to the index of the next empty slot if we find one. It also means we've encountered an empty slot if the loop ends, and then we decide what to return based on the variable `empty`. ```moonbit // Probe to the right of the index of the original hash, return the index of the first empty slot @@ -237,15 +237,15 @@ fn remove[K : Hash + Eq + Default, V : Default](map : HT_open[K, V], key : K) -> } ``` -Next, let's introduce another implementation of open addressing: rearrange elements after each removal to compress the lookup path. Suppose we still add 0, 1, 5, 3 sequentially and then remove 1, we can see that the invariant holds for elements before 1, but cannot be sure if it also holds for elements after it. These elements might have been originally stored here or stored here due to the original slot was occupied and this is the next empty slot. Therefore, a check is required. +Next, let's introduce another implementation of open addressing: rearrange elements after each removal to compress the lookup path. Suppose we still add 0, 1, 5, 3 sequentially and then remove 1, we can see that the invariant holds for elements before 1, but cannot be sure if it also holds for elements after it. These elements might have been originally stored here or stored here due to the original slot was occupied and this is the next empty slot. Therefore, a check is required. First, we check element 5 and notice that 5 should be mapped to index 0, but is stored in the current slot to handle hash collision. Now that element 1 has been removed, the invariant no longer holds as there's an empty slot between indices 0 and 2. To solve this, we need to move element 5 forward to the index previously storing element 1. Then we check element 3 and it's in the slot it should be mapped to, so we do not move it. We encounter an empty slot after element 3. The elements after the empty spot won't be affected, so we stop checking. ![](/pics/rearrange_en.drawio.webp) -Let's look at another example as follows: we have an array of size 10, so a number that ends in *n* will be mapped to index *n* with modulo, like the index for element 0 is 0, for element 11 is 1, for element 13 is 3, etc. We will remove the data at index 1 and rearrange the elements in the hash map. We check the elements at index 1 to 5 and: +Let's look at another example as follows: we have an array of size 10, so a number that ends in *n* will be mapped to index *n* with modulo, like the index for element 0 is 0, for element 11 is 1, for element 13 is 3, etc. We will remove the data at index 1 and rearrange the elements in the hash map. We check the elements at index 1 to 5 and: -We find element 11 should be stored at index 1 if there were no hash collision. After removing the data at index 1, we now have an empty slot at index 1 and can move element 11 to it. Then we check element 3 and it's already in the slot it should be mapped to. Next, we check element 21 which should be stored at index 1, but now we see a gap between slot 1 to the actual slot element 21 is stored. This is caused by moving element 11 earlier, so also move element 21 forward. Lastly, we check element 13 which should be stored at index 3. Now there's a gap after moving element 21, so we move element 13 forward as well. +We find element 11 should be stored at index 1 if there were no hash collision. After removing the data at index 1, we now have an empty slot at index 1 and can move element 11 to it. Then we check element 3 and it's already in the slot it should be mapped to. Next, we check element 21 which should be stored at index 1, but now we see a gap between slot 1 to the actual slot element 21 is stored. This is caused by moving element 11 earlier, so also move element 21 forward. Lastly, we check element 13 which should be stored at index 3. Now there's a gap after moving element 21, so we move element 13 forward as well. Now, the invariant holds again: there should be no empty slots between the original slot and the slot where the key-value pair is actually stored. The detailed implementation is left as an exercise and feel free to give it a try! @@ -253,7 +253,7 @@ Now, the invariant holds again: there should be no empty slots between the origi ## Closure -It's time for the last topic in this lecture! What is a closure? A closure is the combination of a function bundled together with references to its surrounding state. Its surrounding state is determined by the lexical environment. For example, in the following code, when we define the function at line 3, the `i` here corresponds to the `i` at line 2. Therefore, when we call `println_i` later at line 3, it outputs the value of `i` from line 2. Then we update `i` at line 4, and the output will also be updated accordingly. +It's time for the last topic in this lecture! What is a closure? A closure is the combination of a function bundled together with references to its surrounding state. Its surrounding state is determined by the lexical environment. For example, in the following code, when we define the function at line 3, the `i` here corresponds to the `i` at line 2. Therefore, when we call `println_i` later at line 3, it outputs the value of `i` from line 2. Then we update `i` at line 4, and the output will also be updated accordingly. However, when we introduce another `i` at line 7, although the variable names are the same, the new variable `i` has nothing to do with our closure, so the output at line 8 will not change. The environment captured by the closure corresponds to the program structure and is determined at code definition, but not runtime. @@ -272,7 +272,7 @@ fn init { ### Data Encapsulation -We can use closures to encapsulate data and behavior. Variables defined inside a function cannot be accessed from anywhere outside the function, because it's only in the scope of the function. Let's define two functions that capture the value as return value, enabling users to get and set value as shown at lines 4 and 5. +We can use closures to encapsulate data and behavior. Variables defined inside a function cannot be accessed from anywhere outside the function, because it's only in the scope of the function. Let's define two functions that capture the value as return value, enabling users to get and set value as shown at lines 4 and 5. Also, we can add data validation in the functions. User operation is unrestricted if we directly define a mutable field in a structure, but now with validation we can filter illegitimate input. Lastly, we return these two functions. From the results of `get()` we can see that a legitimate input will update the value of the captured variable via the function, while illegitimate input is filtered out. @@ -295,8 +295,8 @@ fn init { ``` We can also use closures with structs to encapsulate the hash map behavior and define an abstract data structure. We previously showed implementations of open addressing and direct addressing, but this does not matter for users as they have the same effect. -In this case, we can define a struct `MyMap` that has four functions, which all capture the same hash map and allow modifications. Then, we provide two functions to construct this struct, offering implementations of both open addressing and direct addressing. As an exercise, think about how we can implement it with a simple list or tree, etc. -Lastly, let's use this struct. We only need to replace the initialization function, and the rest of the code remains unchanged when using different implementations. +In this case, we can define a struct `MyMap` that has four functions, which all capture the same hash map and allow modifications. Then, we provide two functions to construct this struct, offering implementations of both open addressing and direct addressing. As an exercise, think about how we can implement it with a simple list or tree, etc. +Lastly, let's use this struct. We only need to replace the initialization function, and the rest of the code remains unchanged when using different implementations. ```moonbit struct MyMap[K, V] { @@ -306,6 +306,7 @@ struct MyMap[K, V] { size : () -> Int } ``` + ```moonbit no-check // Implementation of open addressing fn MyMap::hash_open_address[K : Hash + Eq + Default, V : Default]() -> MyMap[K, V] { ... } @@ -322,7 +323,7 @@ fn init { } ``` -Here is the main code snippet. We implement the `map` table inside `hash_bucket`, then capture it in multiple functions, store these functions in a struct, and return it. +Here is the main code snippet. We implement the `map` table inside `hash_bucket`, then capture it in multiple functions, store these functions in a struct, and return it. ```moonbit no-check fn MyMap::hash_bucket[K : Hash + Eq, V]() -> MyMap[K, V] { @@ -359,6 +360,7 @@ fn MyMap::contains[K, V](map : MyMap[K, V], key : K) -> Bool { } } ``` + ```moonbit no-check fn init { let map : MyMap[Int, Int] = MyMap::hash_bucket() @@ -370,5 +372,6 @@ fn init { ## Summary We introduced two ways to implement a hash map with direct addressing and open addressing. Meanwhile, we talked about the concept of a closure and how to use it for encapsulation. To better understand the algorithms, the following readings are recommended: -- _**Introduction to Algorithms**_: Chapter 11 - Hash Tables; or -- _**Algorithms**_: Section 3.4 - Hash Tables \ No newline at end of file + +- ***Introduction to Algorithms***: Chapter 11 - Hash Tables; or +- ***Algorithms***: Section 3.4 - Hash Tables diff --git a/docs/11-parser.md b/docs/11-parser.md index 322aab1..6fed4d5 100644 --- a/docs/11-parser.md +++ b/docs/11-parser.md @@ -25,7 +25,7 @@ Divide = "/" Whitespace = " " ``` -Let's take integers and the plus sign for examples. Each line in the lexical rules corresponds to a pattern-matching rule. Content within quotes means matching a string of the same content. Rule `a b` means matching rule `a` first, and if it succeeds, continue to pattern match rule `b`. Rule `a / b` means matching rule `a` or `b`, try matching `a` first, and then try matching rule `b` if it fails. Rule `*a ` with an asterisk in front refers to zero or more matches. Lastly, `%x` means matching a UTF-encoded character, where `x` indicates it's in hexadecimal. For example, `0x30` corresponds to the 48th character `0`, and it is `30` in hexadecimal. With this understanding, let's examine the definition rules. Plus is straightforward, representing the plus sign. Number corresponds to zero or a character from 1-9 followed by zero or more characters from 0-9. +Let's take integers and the plus sign for examples. Each line in the lexical rules corresponds to a pattern-matching rule. Content within quotes means matching a string of the same content. Rule `a b` means matching rule `a` first, and if it succeeds, continue to pattern match rule `b`. Rule `a / b` means matching rule `a` or `b`, try matching `a` first, and then try matching rule `b` if it fails. Rule `*a` with an asterisk in front refers to zero or more matches. Lastly, `%x` means matching a UTF-encoded character, where `x` indicates it's in hexadecimal. For example, `0x30` corresponds to the 48th character `0`, and it is `30` in hexadecimal. With this understanding, let's examine the definition rules. Plus is straightforward, representing the plus sign. Number corresponds to zero or a character from 1-9 followed by zero or more characters from 0-9. ![](/pics/lex_rail.drawio.webp) @@ -74,13 +74,13 @@ test { With this simple parser, we can already handle most tokens, including parentheses, arithmetic operators, and whitespaces. Here, we also define them using anonymous functions and directly try pattern matching all possibilities. It returns `true` if the character is something we want to match; otherwise, it returns `false`. It's the same for whitespaces. However, simply parsing the input into characters isn't enough since we want to obtain more specific enum values, so we'll need to define a mapping function. ```moonbit expr -let symbol: Lexer[Char] = pchar(fn{ +let symbol: Lexer[Char] = pchar(fn{ '+' | '-' | '*' | '/' | '(' | ')' => true _ => false }) ``` -```moonbit +```moonbit let whitespace : Lexer[Char] = pchar(fn{ ch => ch == ' ' }) ``` @@ -154,14 +154,14 @@ Lastly, we can build a lexical analyzer for integers. An integer is either zero ```moonbit // Convert characters to integers via encoding -let zero: Lexer[Int] = +let zero: Lexer[Int] = pchar(fn { ch => ch == '0' }).map(fn { _ => 0 }) -let one_to_nine: Lexer[Int] = +let one_to_nine: Lexer[Int] = pchar(fn { ch => ch.to_int() >= 0x31 && ch.to_int() <= 0x39 },).map(fn { ch => ch.to_int() - 0x30 }) -let zero_to_nine: Lexer[Int] = +let zero_to_nine: Lexer[Int] = pchar(fn { ch => ch.to_int() >= 0x30 && ch.to_int() <= 0x39 },).map(fn { ch => ch.to_int() - 0x30 }) -// number = %x30 / (%x31-39) *(%x30-39) +// number = %x30 / (%x31-39) *(%x30-39) let value : Lexer[Token] = zero.or( one_to_nine.and(zero_to_nine.many()).map( // (Int, @immut/list.T[Int]) fn { (i, ls) => ls.fold_left(fn { i, j => i * 10 + j }, init=i) }, @@ -172,7 +172,7 @@ let value : Lexer[Token] = zero.or( We're now just one step away from finishing lexical analysis: analyzing the entire input stream. There may be whitespaces in between tokens, so we allow arbitrary lengths of whitespaces after defining the number or symbol in line 2. We map and discard the second value in the tuple representing spaces, and may repeat the entire parser an arbitrary number of times. Finally, we can split a string into minus signs, numbers, plus signs, parentheses, etc. However, this output stream doesn't follow the syntax rules of arithmetic expressions. For this, we will need syntax analysis. ```moonbit -let tokens : Lexer[@immut/list.T[Token]] = +let tokens : Lexer[@immut/list.T[Token]] = value.or(symbol).and(whitespace.many()) .map(fn { (symbols, _) => symbols },) // Ignore whitespaces .many() @@ -184,11 +184,11 @@ test{ ## Syntax Analysis -In the last example, we converted a string into a stream of tokens, discarded unimportant whitespaces, and split the string into meaningful enums. Now we will analyze whether the token stream is syntactically valid in terms of arithmetic expressions. As a simple example, the parentheses in an expression should be paired and should close in the correct order. We defined a simple syntax rule in the following code snippet. An arithmetic expression can be a single number, two arithmetic expressions carrying out an operation, or an expression surrounded by parentheses. We aim to convert a token stream into an abstract syntax tree like the one shown below. For the expression `1 + (1 - 5)`, the root node is a plus sign, representing the last operation executed. It means adding 1 to the expression on the right side. The right subtree contains a minus sign with integers 1 and 5, meaning 1 minus 5. The parentheses mean that it is executed earlier, so it's deeper down in the expression tree. Similarly, for the expression `(1 - 5) * 5`, the first calculation executed is the subtraction inside the parentheses, and then the multiplication. +In the last example, we converted a string into a stream of tokens, discarded unimportant whitespaces, and split the string into meaningful enums. Now we will analyze whether the token stream is syntactically valid in terms of arithmetic expressions. As a simple example, the parentheses in an expression should be paired and should close in the correct order. We defined a simple syntax rule in the following code snippet. An arithmetic expression can be a single number, two arithmetic expressions carrying out an operation, or an expression surrounded by parentheses. We aim to convert a token stream into an abstract syntax tree like the one shown below. For the expression `1 + (1 - 5)`, the root node is a plus sign, representing the last operation executed. It means adding 1 to the expression on the right side. The right subtree contains a minus sign with integers 1 and 5, meaning 1 minus 5. The parentheses mean that it is executed earlier, so it's deeper down in the expression tree. Similarly, for the expression `(1 - 5) * 5`, the first calculation executed is the subtraction inside the parentheses, and then the multiplication. ```abnf expression = Value / "(" expression ")" -expression =/ expression "+" expression / expression "-" expression +expression =/ expression "+" expression / expression "-" expression expression =/ expression "*" expression / expression "/" expression ``` @@ -200,7 +200,7 @@ The modified syntax rules are split into three parts. The first one is `atomic`, ```abnf atomic = Value / "(" expression ")" -combine = atomic / combine "*" atomic / combine "/" atomic +combine = atomic / combine "*" atomic / combine "/" atomic expression = combine / expression "+" combine / expression "-" combine ``` @@ -211,6 +211,7 @@ atomic = Value / "(" expression ")" combine = atomic *( ("*" / "/") atomic) expression = combine *( ("+" / "-") combine) ``` + ```moonbit enum Expression { Number(Int) diff --git a/docs/12-autodiff.md b/docs/12-autodiff.md index 59981f0..65acc5f 100644 --- a/docs/12-autodiff.md +++ b/docs/12-autodiff.md @@ -2,23 +2,23 @@ Today, we will talk about another case study on automatic differentiation (autodiff), while avoiding some of the complex mathematical concepts. -Differentiation is an important operation in computer science. In machine learning, neural networks based on gradient descent apply differentiation to find local minima for training. You might be more familiar with solving functions and approximating zeros using Newton's method. Let's briefly review it. Here, we have plotted a function and set the initial value to 1, which is point A on the number axis. +Differentiation is an important operation in computer science. In machine learning, neural networks based on gradient descent apply differentiation to find local minima for training. You might be more familiar with solving functions and approximating zeros using Newton's method. Let's briefly review it. Here, we have plotted a function and set the initial value to 1, which is point A on the number axis. ![](/pics/geogebra-export-0.webp) ![](/pics/geogebra-export-1.webp) -We want to approximate the zeros near it. We calculate point B on the function corresponding to the x-coordinate of this point and find the derivative at the point, which is the slope of the tangent line at that point. +We want to approximate the zeros near it. We calculate point B on the function corresponding to the x-coordinate of this point and find the derivative at the point, which is the slope of the tangent line at that point. ![](/pics/geogebra-export-2.webp) ![](/pics/geogebra-export-3.webp) -By finding the intersection of the tangent line and the x-axis, we get a value that approximates zero. +By finding the intersection of the tangent line and the x-axis, we get a value that approximates zero. ![](/pics/geogebra-export-4.webp) -We then repeat the process to find the point corresponding to the function, calculate the derivative, and find the intersection of the tangent line and the x-axis. +We then repeat the process to find the point corresponding to the function, calculate the derivative, and find the intersection of the tangent line and the x-axis. ![](/pics/geogebra-export-5.webp) @@ -38,12 +38,12 @@ Today, we will look at the following simple combination of functions, involving # Differentiation -There are several ways to differentiate a function. The first method is manual differentiation where we use a piece of paper and a pen as a natural calculator. The drawback is that it's easy to make mistakes with complex expressions and we can't just manually calculate 24 hours a day. The second method is numerical differentiation: $\frac{ \texttt{f}(x + \delta x) - \texttt{f}(x) }{ \delta x }$, where we add a small value (approaching zero) to the point we want to differentiate, calculate the difference, and divide it by the small value. The issue here is that computers cannot accurately represent decimals, and the larger the absolute value, the less accurate it is. Also, we cannot fully solve infinite series. The third method is symbolic differentiation, where we convert the function into an expression tree and then operate on the tree to get the derivative. Take $\textit{Mul(Const(2), Var(1))} \to \textit{Const(2)}$ for example: here the differentiation result of constant 2 multiplied by x will be constant 2. The problem with symbolic differentiation is that the calculation results may not be simplified enough, and there may be redundant calculations. In addition, it's hard to directly use native control flow like conditionals and loops. If we want to define a function to find the larger value, we have to define an operator instead of simply comparing the current values. +There are several ways to differentiate a function. The first method is manual differentiation where we use a piece of paper and a pen as a natural calculator. The drawback is that it's easy to make mistakes with complex expressions and we can't just manually calculate 24 hours a day. The second method is numerical differentiation: $\frac{ \texttt{f}(x + \delta x) - \texttt{f}(x) }{ \delta x }$, where we add a small value (approaching zero) to the point we want to differentiate, calculate the difference, and divide it by the small value. The issue here is that computers cannot accurately represent decimals, and the larger the absolute value, the less accurate it is. Also, we cannot fully solve infinite series. The third method is symbolic differentiation, where we convert the function into an expression tree and then operate on the tree to get the derivative. Take $\textit{Mul(Const(2), Var(1))} \to \textit{Const(2)}$ for example: here the differentiation result of constant 2 multiplied by x will be constant 2. The problem with symbolic differentiation is that the calculation results may not be simplified enough, and there may be redundant calculations. In addition, it's hard to directly use native control flow like conditionals and loops. If we want to define a function to find the larger value, we have to define an operator instead of simply comparing the current values. ```moonbit no-check // Need to define additional native operators for the same effect fn max[N : Number](x : N, y : N) -> N { - if x.value() > y.value() { x } else { y } + if x.value() > y.value() { x } else { y } } ``` @@ -68,13 +68,13 @@ fn Symbol::op_add(f1 : Symbol, f2 : Symbol) -> Symbol { Add(f1, f2) } fn Symbol::op_mul(f1 : Symbol, f2 : Symbol) -> Symbol { Mul(f1, f2) } // Compute function values -fn Symbol::compute(self : Symbol, input : Array[Double]) -> Double { +fn Symbol::compute(self : Symbol, input : Array[Double]) -> Double { match self { Constant(d) => d Var(i) => input[i] // get value following index Add(f1, f2) => f1.compute(input) + f2.compute(input) Mul(f1, f2) => f1.compute(input) * f2.compute(input) - } + } } ``` @@ -85,7 +85,7 @@ Let's review the derivative rules for any constant function, any variable partia - $\frac{\partial (f + g)}{\partial x_i} = \frac{\partial f}{\partial x_i} + \frac{\partial g}{\partial x_i}$ - $\frac{\partial (f \times g)}{\partial x_i} = \frac{\partial f}{\partial x_i} \times g + f \times \frac{\partial g}{\partial x_i}$ -We'll use the previous definition to construct our example function. As we can see, the multiplication and addition operations look very natural because MoonBit allows us to overload some operators. +We'll use the previous definition to construct our example function. As we can see, the multiplication and addition operations look very natural because MoonBit allows us to overload some operators. ```moonbit fn differentiate(self : Symbol, val : Int) -> Symbol { @@ -110,7 +110,7 @@ test "Symbolic differentiation" { let symbol : Symbol = example() // Abstract syntax tree of the function assert_eq!(symbol.compute(input), 600.0) // Expression of df/dx - inspect!(symbol.differentiate(0), + inspect!(symbol.differentiate(0), content="Add(Add(Mul(Mul(Constant(5.0), Var(0)), Constant(1.0)), Mul(Add(Mul(Constant(5.0), Constant(1.0)), Mul(Constant(0.0), Var(0))), Var(0))), Constant(0.0))") assert_eq!(symbol.differentiate(0).compute(input), 100.0) } @@ -155,7 +155,7 @@ let diff_0_simplified : Symbol = Mul(Constant(5.0), Var(0)) ## Automatic Differentiation -Now, let's take a look at automatic differentiation. We first define the operations we want to implement through an interface, which includes constant constructor, addition, and multiplication. We also want to get the value of the current computation. +Now, let's take a look at automatic differentiation. We first define the operations we want to implement through an interface, which includes constant constructor, addition, and multiplication. We also want to get the value of the current computation. ```moonbit trait Number { @@ -225,9 +225,9 @@ test "Forward differentiation" { ### Backward Differentiation -Backward differentiation utilizes the chain rule for calculation. Suppose we have a function $w$ of $x$, $ y$, $z$, etc., and $x$, $y$, $z$, etc. are functions of $t$. Then the partial derivative of $w$ with respect to $t$ is the partial derivative of $w$ with respect to $x$ times the partial derivative of $x$ with respect to $t$, plus the partial derivative of $w$ with respect to $y$ times the partial derivative of $y$ with respect to $t$, plus the partial derivative of $w$ with respect to $z$ times the partial derivative of $z$ with respect to $t$, and so on. +Backward differentiation utilizes the chain rule for calculation. Suppose we have a function $w$ of $x$, $ y$, $z$, etc., and $x$, $y$, $z$, etc. are functions of $t$. Then the partial derivative of $w$ with respect to $t$ is the partial derivative of $w$ with respect to $x$ times the partial derivative of $x$ with respect to $t$, plus the partial derivative of $w$ with respect to $y$ times the partial derivative of $y$ with respect to $t$, plus the partial derivative of $w$ with respect to $z$ times the partial derivative of $z$ with respect to $t$, and so on. -- Given $w = f(x, y, z, \cdots), x = x(t), y = y(t), z = z(t), \cdots$ +- Given $w = f(x, y, z, \cdots), x = x(t), y = y(t), z = z(t), \cdots$ $\frac{\partial w}{\partial t} = \frac{\partial w}{\partial x} \frac{\partial x}{\partial t} + \frac{\partial w}{\partial y} \frac{\partial y}{\partial t} + \frac{\partial w}{\partial z} \frac{\partial z}{\partial t} + \cdots$ For example, for $f(x_0, x_1) = x_0 ^ 2 \times x_1$, we can consider $f$ as a function of $g$ and $h$, where $g$ and $h$ are $x_0 ^ 2$ and $x_1$ respectively. We differentiate each component: the partial derivative of $f$ with respect to $g$ is $h$; the partial derivative of $f$ with respect to $h$ is $g$; the partial derivative of $g$ with respect to $x_0$ is $2x_0$, and the partial derivative of $h$ with respect to $x_0$ is 0. Lastly, we combine them using the chain rule to get the result $2x_0x_1$. Backward differentiation is the process where we start with the partial derivative of $f$ with respect to $f$, followed by calculating the partial derivatives of $f$ with respect to the intermediate functions and their partial derivatives with respect to the intermediate functions, until we reach the partial derivatives with respect to the input parameters. This way, by tracing backward and creating the computation graph of $f$ in reverse order, we can compute the derivative of each input node. This is suitable for cases where there are more input parameters than output parameters. @@ -247,7 +247,7 @@ struct Backward { fn Backward::var(value : Double, diff : Ref[Double]) -> Backward { // Update the partial derivative along a computation path df / dvi * dvi / dx - { value, backward: fn { d => diff.val = diff.val + d } } + { value, backward: fn { d => diff.val = diff.val + d } } } fn Backward::constant(d : Double) -> Backward { @@ -303,7 +303,7 @@ Then, we'll use Newton's method to find the value. Since there is only one param } ``` -To approximate zeros with Newton's method: +To approximate zeros with Newton's method: - First, define $x$ as the iteration variable with an initial value of 1.0. Since $x$ is the variable with respect to which we are differentiating, we'll set the second parameter to be true. - Second, define an infinite loop. @@ -332,5 +332,3 @@ test "Newton's method" { # Summary To summarize, in this lecture we introduced the concept of automatic differentiation. We presented symbolic differentiation and two different implementations of automatic differentiation. For students interested in learning more, we recommend the *3Blue1Brown* series on deep learning (including topics like [gradient descent](https://www.youtube.com/watch?v=IHZwWFHWa-w), [backpropagation algorithms](https://www.youtube.com/watch?v=Ilg3gGewQ5U)), and try to write your own neural network. - - diff --git a/docs/13-neural-network.md b/docs/13-neural-network.md index b2bbe1b..fadb116 100644 --- a/docs/13-neural-network.md +++ b/docs/13-neural-network.md @@ -219,6 +219,7 @@ The above is the entire structure of this neural network. Now, we will try to im ### Basic Operations First, we define an abstraction for operations. The operations we need to perform include: + - `constant`: type conversion from `Double` to a certain type; - `value`: retrieving the intermediate result from that type; and - `op_add`, `op_neg`, `op_mul`, `op_div`, `exp`: addition, multiplication, division, negation, and exponential operations. @@ -272,7 +273,6 @@ fn softmax[T : Base](inputs : Array[T]) -> Array[T] { Next, let's implement the forward propagation function from the input layer to the hidden layer. We iterate over each parameter $w_i$ and each input $x_i$ to calculate $\sum w_i x_i$, and add the bias $c$. The result is then passed to the ReLU function through the pipeline operator `|>` to obtain the final output. - ```moonbit fn input2hidden[T : Base](inputs: Array[Double], param: Array[Array[T]]) -> Array[T] { let outputs : Array[T] = Array::make(param.length(), T::constant(0.0)) @@ -332,12 +332,12 @@ fn cross_entropy[T : Base + Log](inputs: Array[T], expected: Int) -> T { ### Gradient Descent -With the cost function in place, the next step is to perform gradient descent through backpropagation. Since we already covered this in [Chapter 12](./autodiff), we will simply show the code here. +With the cost function in place, the next step is to perform gradient descent through backpropagation. Since we already covered this in [Chapter 12](./autodiff), we will simply show the code here. Accumulate the partial derivatives: ```moonbit -fn Backward::param(param: Array[Array[Double]], diff: Array[Array[Double]], +fn Backward::param(param: Array[Array[Double]], diff: Array[Array[Double]], i: Int, j: Int) -> Backward { { value: param[i][j], @@ -387,10 +387,10 @@ That is to say, the learning rate gradually decreases as the number of training ### Training and Testing -Finally, we can use the data to train our neural network. Usually, we need to randomly divide the dataset into two parts: the training set and the testing set. During the training phase, we compute the cost function and perform differentiation based on the data from the training set, and adjust the parameters based on the learning rate. The testing set is used to evaluate the results after all the training is completed. This is to avoid overfitting, where the model performs well on the training set but fails on general cases. This is also why the data in the training set needs to be randomly split. +Finally, we can use the data to train our neural network. Usually, we need to randomly divide the dataset into two parts: the training set and the testing set. During the training phase, we compute the cost function and perform differentiation based on the data from the training set, and adjust the parameters based on the learning rate. The testing set is used to evaluate the results after all the training is completed. This is to avoid overfitting, where the model performs well on the training set but fails on general cases. This is also why the data in the training set needs to be randomly split. For example, if our complete dataset contains 3 types of iris flowers, and the training set only contains two types of iris flowers, the model will never be able to correctly identify the third type of iris flower. In our case, we have a relatively small amount of training data, so we can perform full batch training, meaning that each epoch consists of one iteration, in which all the training samples are used. However, if we have a larger amount of data, we may perform mini batch training instead by splitting an epoch into several iterations and select a subset of the training data for each iteration. ## Summary -This chapter introduces the basics of neural networks, including the structure and the training process of a neural network. Although the content is not in-depth, it is enough to give you a preliminary understanding of the topic. If you are interested, you can find the complete code [here](https://try.moonbitlang.com/examples/course/lec13/neural_network.mbt). \ No newline at end of file +This chapter introduces the basics of neural networks, including the structure and the training process of a neural network. Although the content is not in-depth, it is enough to give you a preliminary understanding of the topic. If you are interested, you can find the complete code [here](https://try.moonbitlang.com/examples/course/lec13/neural_network.mbt). diff --git a/docs/14-stack-machine.md b/docs/14-stack-machine.md index cfb2d3e..61b83f1 100644 --- a/docs/14-stack-machine.md +++ b/docs/14-stack-machine.md @@ -1,4 +1,4 @@ -# 14. Case Study: Stack Machine +# 14. Case Study: Stack Machine In this chapter, we are going to implement a simple stack-based virtual machine based on WebAssembly. @@ -15,7 +15,9 @@ In addition to compilation and interpretation, another way is to combine the two There are two common types of virtual machines: one is the stack-based virtual machine, where operands are stored on a stack following the Last-In First-Out (LIFO) principle; the other is the register-based virtual machine, where operands are stored in registers like what actually happens in a normal computer. The stack-based virtual machine is simpler to implement and has a smaller code size, while the register-based virtual machine is closer to the actual computer organization and has higher performance. Taking the `max` function as an example, + - Lua VM (register-based): + ``` MOVE 2 0 0 ; R(2) = R(0) LT 0 0 1 ; R(0) < R(1)? @@ -24,7 +26,9 @@ Taking the `max` function as an example, RETURN 2 2 0 ; return R(2) RETURN 0 1 0 ; return ``` + - WebAssembly VM (stack-based): + ```wasm local.get $a local.set $m ;; let mut m = a local.get $a local.get $b i32.lt_s ;; if a < b { @@ -113,7 +117,6 @@ To set the value of an local variable, we can use the `Local_Set` instruction. After the function `add` is defined, we can call it to perform some calculations. Just like what we did in our first example, we first put `1` and `2` on the stack. Then, instead of using the `Add` instruction, we call the `add` function we defined using the `Call` instruction. At this time, according to the number of function parameters, the corresponding number of elements on the top of the stack will be consumed, bound to local variables in order, and an element representing the function call will be pushed to the stack. It separates the original stack elements from the function's own data, and also records the number of its return values. After the function call is finished, according to the number of return values, we take out the elements from the top of the stack, remove the element for the function call, and then put the original top elements back. After that, that we get the calculation result at the place where the function is called. - ```moonbit no-check @immut/list.Ts::[ Const(I32(1)), Const(I32(2)), Call("add") ] ``` @@ -125,9 +128,9 @@ After the function `add` is defined, we can call it to perform some calculations For conditional statements, as we introduced earlier, we use a 32-bit integer to represent `true` or `false`. When we execute the `If` statement, we take out the top element of the stack. If it is non-zero, the `then` branch will be executed; otherwise, the `else` branch will be executed. It is worth noting that each code block in Wasm has parameter types and return value types, corresponding to the elements to be consumed from the top of the stack when entering the code block, and the elements to be put on the top of the stack when exiting the code block. For example, when we enter the `if/else` block, there is no input, so we assume that the stack is empty when we perform calculations inside the block, no matter what is on the stack originally, it is irrelevant to the current code block. And we declared to return an integer, so when we normally end the execution, there must be one and only one integer in the current calculation environment. ```moonbit no-check -@immut/list.of([ - Const(I32(1)), Const(I32(0)), Equal, - If(1, @immut/list.of([Const(I32(1))]), @immut/list.of([Const(I32(0))])) +@immut/list.of([ + Const(I32(1)), Const(I32(0)), Equal, + If(1, @immut/list.of([Const(I32(1))]), @immut/list.of([Const(I32(0))])) ]) ``` @@ -273,7 +276,6 @@ struct State { What we need to do now is calculate the next state based on the previous state by pattern matching on the current instruction and data stack. Since errors may occur, the returned state should be wrapped by `Option`. If the match is successful, like the `Add` instruction here, there should be two consecutive integers representing the operands at the top of the stack, then we can calculate the next state. If all matches fail, it means something went wrong, and we use a wildcard to handle such cases and return a `None`. - ```moonbit fn evaluate(state : State, stdout : Buffer) -> Option[State] { match (state.instructions, state.stack) { @@ -323,9 +325,10 @@ After execution, it should be encountering the control instruction to return the ## Summary In this chapter we - - Learned the structure of a stack-based virtual machine - - Introduced a subset of the WebAssembly instruction set - - Implemented a compiler - - Implemented an interpreter + +- Learned the structure of a stack-based virtual machine +- Introduced a subset of the WebAssembly instruction set +- Implemented a compiler +- Implemented an interpreter Interested readers may try to expand the definition of functions in the syntax parser, or add the `return` instruction to the instruction set.