Skip to content

Commit 54cccd0

Browse files
committed
OOPLSA typos + comment
1 parent a356fb1 commit 54cccd0

File tree

1 file changed

+33
-22
lines changed

1 file changed

+33
-22
lines changed

papers/2024-oopsla-typedc-dependent-nominal-physical-type-system.md

+33-22
Original file line numberDiff line numberDiff line change
@@ -22,14 +22,14 @@ rules. The compiler assumes that you do not make these mistakes, which
2222
are not checked, meaning that if you do make them, your program
2323
may crash, corrupt memory, or exhibit other unpredictable
2424
behaviors. Moreover, these mistakes can be exploited by an attacker to
25-
take control of the execution of the processus running your program,
25+
take control of your program,
2626
and this represents both the most common and the most severe kind of
2727
security vulnerabilities.
2828

2929

3030
Thus, it is important to ensure that your program is free from
31-
these undefined behaviors. Providing tools that do this in a
32-
practical way is one of the main purposes behind the development of
31+
these undefined behaviors. Providing tools that do this practically
32+
is one of the main purposes behind the development of
3333
[Codex](https://codex.top), a sound static analyzer based on abstract
3434
interpretation. The paper focuses on particular method that can
3535
ensure spatial memory safety of C or binary programs almost
@@ -38,18 +38,20 @@ automatically, requiring only a small amount of type annotations.
3838

3939
## Example
4040

41-
This examples comes from our
41+
This example comes from our
4242
[tutorial](/docs/tutorial_oopsla2024.pdf), and extracted from an OS
4343
code that we have analyzed.
4444

4545
Suppose that we are given a function in a library described using the following header file:
4646

4747
```c
48+
// Linked list of messages, each containing a fixed-length buffer
4849
struct message {
4950
struct message *next;
5051
char *buffer;
5152
};
5253

54+
// Wrapper around the linked list, specifies the length of all buffers
5355
struct message_box {
5456
int length;
5557
struct message *first;
@@ -58,23 +60,23 @@ struct message_box {
5860
void zeros_buffer(struct message_box* box);
5961
```
6062
61-
An examples of a memory layout that would fit this description is the following one:
63+
An example of a memory layout that would fit this description is the following one:
6264
6365
<img src="/assets/publications/imgs/2024-oopsla-struct_layout.png"
6466
style="width:700px; display:block; margin-left:auto; margin-right:auto">
6567
66-
Here, we assumed that `message` is a singly-linked list, and that the `char *` pointer points to a single char.
68+
In the image, we assumed that `message` is a singly-linked list, and that the `char *` pointer points to a single char.
6769
6870
Now, we want to verify that the implementation of `zeros_buffer`, a function that sets all the buffers in the `message_box` to zero, is memory-safe.
6971
7072
```c
7173
void zeros_buffer(struct message_box *box) {
72-
74+
7375
struct message * first = box->first;
7476
struct message * current = first;
75-
77+
7678
int length = box->length;
77-
79+
7880
do {
7981
for (int i = 0; i < length; i++) {
8082
current->buffer[i] = 0 ;
@@ -85,21 +87,22 @@ void zeros_buffer(struct message_box *box) {
8587
```
8688

8789
Note that this function is memory-safe only if the `box` parameter follows some invariants, in particular:
88-
- The list of `message`s is circular (the code never tests the `next` field to see if it can be a null pointer)
89-
- Each `message` points to a `buffer` whose size corresponds to `box->length`.
90+
1. Each `message` points to a `buffer` whose size corresponds to `box->length`.
91+
2. The list of `message`s is circular (the code never tests the `next` field to see if it can be a null pointer)
92+
9093

9194
So, if we try to analyze this code as is, Codex will correctly report that the code is not memory safe. Indeed, a main feature of Codex is that the analysis is sound: if there is a spatial memory safety issue, it should report it.
9295

9396
Luckily, it is easy to express the required invariants in our type system. It suffices to copy the header file, and edit it as follows:
9497

9598
```c
96-
struct message(len) {
99+
struct message(len) {
97100
struct message(len)+ next;
98101
char[len]+ buffer;
99102
};
100103

101-
∃ mlen:integer with self > 0.
102-
struct message_box {
104+
∃ mlen:integer with self > 0.
105+
struct message_box {
103106
(integer with self = mlen) length;
104107
struct message(mlen)+ first;
105108
};
@@ -123,12 +126,20 @@ Now, this updated header file is not for the C compiler, but is used
123126
by our Codex tool, that can now verify that `zeros_buffer` is
124127
memory-safe (as it does not report any alarm) automatically. Note that
125128
this proof relies on the hypothesis that the `box` argument of
126-
`zeros_buffer` correspond to the memory layout described by the types;
127-
but that in any analyzed function that would call `zeros_buffer`, we
128-
would check this hypothesis. Thus, if you verify all the functions in
129-
a program, we prove it memory-safe.
130-
131-
Finally, this verification of `zeros_buffer` can be make not only on
129+
`zeros_buffer` correspond to the memory layout described by the types.
130+
This assumption is checked in any analyzed function that would call `zeros_buffer`.
131+
Thus, if you verify all the functions in a program, we prove it memory-safe.
132+
133+
{: .note }
134+
While codex **ensures spatial memory safety** (no invalid pointer read/write),
135+
it does **not ensure termination**.
136+
Even with our given types, the `zeros_buffer` function may loop infinitely.
137+
Indeed, we cannot express the invariant stating the list is circular. It is sort
138+
of implied by the constraints that the `next` pointer is never null, since memory
139+
is finite, the list will eventually reach a loop. However, we may have a lasso-shape,
140+
where the first few `message`s are not part of that loop.
141+
142+
Finally, this verification of `zeros_buffer` can be made not only on
132143
the C source code, but also on the compiled machine code, i.e. Codex
133144
can perform type-checking of both C and machine code automatically!
134145
@@ -140,7 +151,7 @@ inspired by that of C, as the basis for this abstraction. While
140151
initial versions of this type system have been proposed in
141152
[VMCAI'22](/papers/2022-vcmai-lightweight-shape-analysis.html) and
142153
used in [RTAS'21](/papers/2021-rtas-no-crash-no-exploit.html), this
143-
paper extends it significatively with new features like support for
154+
paper extends it significantly with new features like support for
144155
union, parameterized, and existential types. The paper shows how to
145156
combine all these features to encode many complex low-level idioms,
146157
such as flexible array members or discriminated unions using a memory
@@ -154,7 +165,7 @@ memory area whose length corresponds to the contents of this
154165
integer". Thus, a type system that can be used to guarantee memory
155166
safety must use dependent types. This makes type checking particularly
156167
complex, which is why we use abstract interpretation to type-check the
157-
program. Abstract interpretation also allows automatical infererence
168+
program. Abstract interpretation also allows automatic inference
158169
of other kinds of program invariants (beyond those expressed by the
159170
type system), that helps the overall analysis to type-check the
160171
program and verify its spatial memory safety.

0 commit comments

Comments
 (0)