Skip to content

Commit 796fb38

Browse files
Merge pull request #7497 from NlightNFotis/document_symex_instructions
Document symex instructions
2 parents 19470af + a4464fb commit 796fb38

File tree

2 files changed

+143
-11
lines changed

2 files changed

+143
-11
lines changed

doc/architectural/cbmc-architecture.md

+12-11
Original file line numberDiff line numberDiff line change
@@ -63,40 +63,41 @@ For an explanation of the data structures involved in the modeling of a GOTO
6363
program (the GOTO Intermediate Representation used by CBMC and assorted tools)
6464
please see \subpage central-data-structures .
6565

66-
## {C, java bytecode} → Parse tree → Symbol table → GOTO programs → GOTO program transformations → BMC → counterexample (goto_tracet) → printing ##
66+
## {C, java bytecode} → Parse tree → Symbol table → GOTO programs → GOTO program transformations → BMC → counterexample (goto_tracet) → printing
6767

6868
To be documented.
6969

70-
## Instrumentation: goto functions → goto functions ##
70+
## Instrumentation: goto functions → goto functions
7171

7272
To be documented.
7373

74-
## Goto functions → BMC → Counterexample (trace) ##
74+
## Goto functions → BMC → Counterexample (trace)
7575

76-
To be documented.
76+
For an explanation of part of how the BMC (Symex) process works, please refer
77+
to [Symex and GOTO program instructions](symex-instructions.md)
7778

78-
## Trace → interpreter → memory map ##
79+
## Trace → interpreter → memory map
7980

8081
To be documented.
8182

82-
## Goto functions → abstract interpretation ##
83+
## Goto functions → abstract interpretation
8384

8485
To be documented.
8586

86-
## Executables (flow of transformations): ##
87+
## Executables (flow of transformations):
8788

88-
### goto-cc ###
89+
### goto-cc
8990

9091
To be documented.
9192

92-
### goto-instrument ###
93+
### goto-instrument
9394

9495
To be documented.
9596

96-
### cbmc ###
97+
### cbmc
9798

9899
To be documented.
99100

100-
### goto-analyzer ###
101+
### goto-analyzer
101102

102103
To be documented.
+131
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Symex and GOTO program instructions
2+
3+
In [doc/central-data-structures](central-data-structures.md) we talked about
4+
the relationship between various central data structures of CBMC.
5+
6+
We identified the `goto_programt` as being the core data structure representing
7+
GOTO-IR, which we defined as a list of GOTO program instructions.
8+
9+
As a reminder, instructions are composed of three subcomponents:
10+
11+
* An instruction type,
12+
* A statement (denoting the actual code the instruction contains),
13+
* An instruction guard (an expression that needs to be evaluated to `true` before
14+
the instruction is to be executed - if one is attached to the instruction).
15+
16+
In this document, we are going to take a closer look at the first subcomponent,
17+
the instruction types, along with the interplay between the instructions and a
18+
central CBMC module, the *symbolic execution engine* (from now on, just *symex*).
19+
20+
## A (very) short introduction to Symex
21+
22+
Symex is, at its core, a GOTO-program interpreter that uses symbolic values instead of actual ones.
23+
This produces a formula which describes all possible outputs rather than a single output value.
24+
While Symex is interpreting the program, it also builds a list of Static Single Assignment (SSA)
25+
steps that form part of the equation that is to be sent to the solver. For more information see
26+
[src/goto-symex](../../src/goto-symex/README.md).
27+
28+
You can see the main instruction dispatcher (what corresponds to the main interpreter
29+
loop) at `goto_symext::execute_next_instruction`.
30+
31+
Symex's source code is available under [src/goto-symex](../../src/goto-symex/).
32+
33+
## Instruction Types
34+
35+
Every GOTO-program instruction has a specific type. You can see the instruction type
36+
at #goto_program_instruction_typet but we will also list some of them here for illustration
37+
purposes:
38+
39+
```c
40+
enum goto_program_instruction_typet
41+
{
42+
[...]
43+
GOTO = 1, // branch, possibly guarded
44+
ASSUME = 2, // assumption
45+
ASSERT = 3, // assertions
46+
SKIP = 5, // just advance the PC
47+
SET_RETURN_VALUE = 12, // set function return value (no control-flow change)
48+
ASSIGN = 13, // assignment lhs:=rhs
49+
DECL = 14, // declare a local variable
50+
DEAD = 15, // marks the end-of-live of a local variable
51+
FUNCTION_CALL = 16, // call a function
52+
[...]
53+
};
54+
```
55+
56+
(*NOTE*: The above is for illustration purposes only - the list is not complete, neither is it
57+
expected to be kept up-to-date. Please refer to the type #goto_program_instruction_typet for a
58+
list of what the instructions look like at this point in time.)
59+
60+
You can observe these instruction types in the user-friendly print of the goto-program that
61+
various CPROVER programs produce with the flag `--show-goto-functions`. For a live example,
62+
consider the following C file:
63+
64+
```c
65+
int main(int argc, char **argv)
66+
{
67+
int a[] = {0, 1, 2, 3};
68+
__CPROVER_assert(a[3] != 3, "expected failure: last element of array 'a' is equal to 3");
69+
}
70+
```
71+
72+
If we ask CBMC to print the GOTO-program instructions with `--show-goto-functions`, then part
73+
of the output looks like this:
74+
75+
```c
76+
[...]
77+
78+
main /* main */
79+
// 0 file /tmp/example.c line 3 function main
80+
DECL main::1::arry : signedbv[32][4]
81+
// 1 file /tmp/example.c line 3 function main
82+
ASSIGN main::1::arry := { 0, 1, 2, 3 }
83+
// 2 file /tmp/example.c line 4 function main
84+
ASSERT main::1::arry[cast(3, signedbv[64])] ≠ 3 // expected failure: last arry element is equal to 3
85+
// 3 file /tmp/example.c line 5 function main
86+
DEAD main::1::arry
87+
// 4 file /tmp/example.c line 5 function main
88+
SET RETURN VALUE side_effect statement="nondet" is_nondet_nullable="1"
89+
// 5 file /tmp/example.c line 5 function main
90+
END_FUNCTION
91+
```
92+
93+
In the above excerpt, the capitalised words at the beginning each instruction are the
94+
correspondent instruction types (`DECL`, `ASSIGN`, etc).
95+
96+
---
97+
98+
Symex (as mentioned above) is going to pick a designated entry point and then start going through
99+
each instruction. This happens at `goto_symext::execute_next_instruction`. While doing so, it will
100+
inspect the instruction's type, and then dispatch to a designated handling function (which usually
101+
go by the name `symex_<instruction-type>`) to handle that particular instruction type and its
102+
symbolic execution. In pseudocode, it looks like this:
103+
104+
```c
105+
switch(instruction.type())
106+
{
107+
[...]
108+
109+
case GOTO:
110+
symex_goto(state);
111+
break;
112+
113+
case ASSUME:
114+
symex_assume(state, instruction.condition());
115+
break;
116+
```
117+
118+
The way the [`symex` subfolder](../../src/goto-symex/) is structured, the different
119+
dispatching functions are usually in their own file, designated by the instruction's
120+
name. As an example, you can find the code for the function goto_symext::symex_goto
121+
in [symex_goto.cpp](../../src/goto-symex/symex_goto.cpp)
122+
123+
The output of symex is an symex_target_equationt which consists of equalities between
124+
different expressions in the program. You can visualise it as a data structure that
125+
serialises to this: `(a = 5 ∨ a = 3) ∧ (b = 3) ∧ (c = 4) ∧ ...` that describe assignments
126+
and conditions for a range of possible executions of a program that cover a range of
127+
potential paths within it.
128+
129+
This is a high-level overview of symex and goto-program instructions.
130+
For more information (and a more robust introduction), please have a look
131+
at \ref symbolic-execution.

0 commit comments

Comments
 (0)