|
2 | 2 |
|
3 | 3 | - [Perf的功能](#perf的功能)
|
4 | 4 | - [概念](#概念)
|
5 |
| - - [CPU cache](#cpu-cache) |
| 5 | + - [Instruction-Level Parallelism](#instruction-level-parallelism) |
6 | 6 | - [Instruction pipelining](#instruction-pipelining)
|
7 | 7 | - [Superscalar processor](#superscalar-processor)
|
8 | 8 | - [Out-of-order execution](#out-of-order-execution)
|
| 9 | + - [Pipeline Hazard](#pipeline-hazard) |
9 | 10 | - [Branch predication](#branch-predication)
|
| 11 | + - [CPU cache](#cpu-cache) |
10 | 12 | - [Performance Monitor Unit](#performance-monitor-unit)
|
11 | 13 | - [Hardware performance counter](#hardware-performance-counter)
|
12 | 14 | - [Model-Specific Registers](#model-specific-registers)
|
|
99 | 101 |
|
100 | 102 | ## 概念
|
101 | 103 |
|
102 |
| -### CPU cache |
103 |
| -> A **CPU cache** is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have different independent caches, including instruction and data caches, where the data cache is usually organized as a hierarchy of more cache levels (L1, L2, etc.). |
104 |
| -
|
105 |
| - |
| 104 | +### Instruction-Level Parallelism |
| 105 | + |
106 | 106 |
|
107 | 107 | ### Instruction pipelining
|
108 | 108 | > **Instruction pipelining** is a technique that implements a form of parallelism called instruction-level parallelism within a single processor. It therefore allows faster CPU throughput (the number of instructions that can be executed in a unit of time) than would otherwise be possible at a given clock rate. The basic instruction cycle is broken up into a series called a pipeline. Rather than processing each instruction sequentially (finishing one instruction before starting the next), each instruction is split up into a sequence of dependent steps so different steps can be executed in parallel and instructions can be processed concurrently (starting one instruction before finishing the previous one).
|
|
124 | 124 | > In computer engineering, **out-of-order execution** (or more formally **dynamic execution**) is a paradigm used in most high-performance microprocessors to make use of instruction cycles that would otherwise be wasted by a certain type of costly delay. In this paradigm, a processor executes instructions in an order governed by the availability of input data, rather than by their original order in a program. In doing so, the processor can avoid being idle while waiting for the preceding instruction to complete to retrieve data for the next instruction in a program, processing instead the next instructions that are able to run immediately and independently.
|
125 | 125 |
|
126 | 126 | 
|
| 127 | + |
| 128 | +### Pipeline Hazard |
127 | 129 | * 数据相关(data dependency):下一条指令会用到这一条指令计算出的结果
|
128 | 130 | * 控制相关(control dependency):一条指令要确定下一条指令的位置,如在执行跳转、调用或返回指令
|
129 |
| -* 流水线冒险 |
130 |
| - * 数据冒险(data hazard) |
131 |
| - * 控制冒险(control hazard) |
| 131 | +#### 流水线冒险(pipeline hazard) |
| 132 | +* **结构冒险(structural hazard)**,硬件不支持多条指令在同一时钟周期执行 |
| 133 | +* **数据冒险(data hazard)**,也叫流水线数据冒险,因无法提供指令执行所需要数据而导致指令不能在预定的时钟周期内执行的情况 |
| 134 | +* **控制冒险(control hazard)**,因为取到的指令并不是所需要的(或者说指令地址的变化并不是流水线所预期的)而导致指令不能在预定的时钟周期内执行 |
132 | 135 | * 用流水线停顿(stalling)来避免冒险
|
133 | 136 | * 用数据转发(data forwarding),有时也称为旁路(bypass)来避免停顿。
|
134 | 137 | * 加载/使用冒险(load/use hazard)
|
|
139 | 142 |
|
140 | 143 | 
|
141 | 144 |
|
| 145 | +### CPU cache |
| 146 | +> A **CPU cache** is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have different independent caches, including instruction and data caches, where the data cache is usually organized as a hierarchy of more cache levels (L1, L2, etc.). |
| 147 | +
|
| 148 | + |
| 149 | + |
142 | 150 | ### Performance Monitor Unit
|
143 | 151 |
|
144 | 152 | > **Performance Monitoring Unit**, or the **PMU**, is found in all high end processors these days. The PMU is basically hardware built inside a processor to measure it's performance parameters. We can measure parameters like instruction cycles, cache hits, cache misses, branch misses and many others depending on the support i.e. hardware provide by the processor. And as the measurement is done by the hardware there is very limited overhead.
|
|
0 commit comments