Skip to content

Commit 3023819

Browse files
author
zhangdanfeng
committed
fix opencv thread
Signed-off-by: zhangdanfeng <[email protected]>
1 parent b771790 commit 3023819

8 files changed

+315
-181
lines changed

TEMP.md

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# 理解RV calling Convention
2+
3+
---
4+
5+
assembly is entirely based on convention, so if you do not strictly follow convention you will not have working code. Understanding convention in RISC-V consists of 3 important parts: registers,
6+
function calls, and entering/exiting a function (prologue/epilogue).
7+
8+
## registers
9+
10+
- x0/zero Always holds 0
11+
- x1/ra Holds the return address 代码段的地址,函数调用完成后,pc恢复到这里储存的地址。
12+
- x2/sp Holds the address of the boundary of the stack 即 current base,栈由高地址往地址增长先进后出,栈地址开辟需要对sp做减法,当退出函数时sp需要做加法。
13+
- Global pointer: x3 (gp)
14+
- Thread pointer: x4 (tp)
15+
- x5/t0-x31/t6 Holds temporary values that do not persist after function calls,t寄存器的特点是,经过一次函数调用后,寄存器的值不保证还会存在。所以callee的prologue无需保存t寄存器,反而caller需要保存x5-x7,x28-x31(t0-t6)。
16+
- x8/fp/s0-s11 Holds values that persist after function calls,s寄存器的特点是储存的值在经过函数调用后依旧正确有效。所以callee的prologue需要保存x18/s2-s11寄存器,反之caller无需保存。
17+
- x10/a0-a1 Holds the first two arguments to the function or the return values
18+
- a2-a7 Holds any remaining arguments(remaining arg words passed in parent’s stack frame)
19+
20+
## function calls
21+
22+
函数调用通常是,使用jal指令跳转到label或使用jalr指令跳转到寄存器中保存的地址,
23+
24+
``` jal ra label or jalr ra rd imm ```
25+
26+
- jal指令会将PC+4保存在ra寄存器中,
27+
- jalr会把rd+imm保存在ra寄存器中。
28+
29+
调用函数时会把参数传进a寄存器并且调用结束后return值会放在a0-1寄存器中。
30+
所以这也隐含了a寄存器在函数调用过程中无需进行额外的处理。
31+
32+
## prologue/epilogue
33+
34+
- sp寄存器退出时(做加法释放空间)的值应该和进入函数时(做减法开辟空间)的值相同。
35+
- 所有的s寄存器都需要进行保存与恢复
36+
- 函数推出后pc指向ra中的值
37+
38+
为了实现这些约定:
39+
40+
```
41+
def prologue():
42+
Decrement sp by num s registers + local var space 为使用到的S寄存器与局部变量开辟栈存储空间
43+
Store any saved registers used 保存这些被用到的寄存器
44+
Store ra if a function call is made 如果后面有函数调用需要保存ra
45+
46+
def epilogue ( ) :
47+
Reload any saved registers used 恢复保存过的寄存器的值
48+
Reload ra 需要的话恢复保存的ra的值
49+
Increment sp back to previous value 增加sp的值释放栈空间
50+
Jump back to return address 跳转到ra
51+
52+
例子
53+
sumsquares:
54+
prologue:
55+
addi sp sp −16
56+
sw s0 0(sp)
57+
sw s1 4(sp)
58+
sw s2 8(sp)
59+
sw ra 12(sp)
60+
61+
li s0 1
62+
mv s1 a0
63+
mv s2 0
64+
65+
loopstart:
66+
bge s0 s1 loopend
67+
mv a0 s0
68+
jal square
69+
add s2 s2 a0
70+
addi s0 s01
71+
j loopstart
72+
73+
loopend:
74+
mv a0 s2
75+
76+
epilogue:
77+
lw s0 0(sp)
78+
lw s1 4(sp)
79+
lw s2 8(sp)
80+
lw ra 12(sp)
81+
addi sp sp 16
82+
jr ra
83+
```
84+
85+
汇编代码的convention检查列表
86+
87+
- Check that you stored ra properly. For recursion make sure you link ra for each ecursive call. You can test this by putting a break point at the end of the epilogue and seeing where you return.
88+
- Check that you don’t use any t registers after a function call.
89+
- Check that sp enters and exits with the same value.
90+
- Check the number of times you enter the prologue equals the number of times you enter the epilogue.
91+
- Make sure you restore every register you modified.
92+
93+
## 介绍下call stack aka stack frame
94+
95+
Each activation record contains
96+
• the return address for that invocation
97+
• the local variables for that procedure
98+
A stack pointer (sp) keeps track of
99+
the top of the stack
100+
• dedicated register (x2) on the RISC-V
101+
Manipulated by push/pop operations
102+
• push: move sp down, store
103+
• pop: load, move sp up
104+
105+
(Call) Stacks start at a high address in memory
106+
Stacks grow down as frames are pushed on
107+
• Note: data region starts at a low address and grows up
108+
• The growth potential of stacks and data region are not
109+
artificially limited
110+
Return Address lives in Stack Frame
111+
Stack contains stack frames (aka “activation records”)
112+
• 1 stack frame per dynamic function
113+
• Exists only for the duration of function
114+
• Grows down, “top” of stack is sp, x2
115+
• Example: lw x5, 0(sp) puts word at top of stack into x5
116+
Each stack frame contains:
117+
• Local variables, return address (later), register
118+
backups (later)
119+
120+
First eight arguments:
121+
passed in registers x10-x17
122+
• aka a0, a1, …, a7
123+
Subsequent arguments:
124+
”spill” onto the stack
125+
Args passed in child’s
126+
stack frame
127+
128+
x8, aka fp (also known as s0)
129+
can be used to restore sp on exit

classification.cc

Lines changed: 69 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,13 @@
1515
* limitations under the License.
1616
*/
1717

18+
/*
19+
rm classification/tflite_classification & make -f Makefile-rv
20+
qemu-riscv64 classification/tflite_classification -m \
21+
classification/imagenet_mobilenet_v1_100_224_classification.tflite -i \
22+
classification/dog.jpg -l classification/labels.txt -c 1 -b 0 -s 255 -t 1
23+
*/
24+
1825
#include <getopt.h>
1926
#include <libgen.h>
2027
#include <memory.h>
@@ -61,7 +68,8 @@ bool getFileContent(std::string fileName, std::vector<std::string> &vecOfStrs) {
6168
// Read the next line from File untill it reaches the end.
6269
while (std::getline(in, str)) {
6370
// Line contains string of length > 0 then save it in vector
64-
if (str.size() > 0) vecOfStrs.push_back(str);
71+
if (str.size() > 0)
72+
vecOfStrs.push_back(str);
6573
}
6674
// Close The File
6775
in.close();
@@ -75,12 +83,12 @@ void DisplayFrames(char *display_win, int input_source, Mat &show_image,
7583
std::string &output_labels) {
7684
// overlay the display window
7785
cv::putText(show_image, output_labels.c_str(),
78-
cv::Point(32, 32), // Coordinates
79-
cv::FONT_HERSHEY_COMPLEX_SMALL, // Font
80-
1.25, // Scale. 2.0 = 2x bigger
81-
cv::Scalar(0, 0, 0), // Color
82-
1.5, // Thickness
83-
8); // Line type
86+
cv::Point(32, 32), // Coordinates
87+
cv::FONT_HERSHEY_COMPLEX_SMALL, // Font
88+
1.25, // Scale. 2.0 = 2x bigger
89+
cv::Scalar(0, 0, 0), // Color
90+
1.5, // Thickness
91+
8); // Line type
8492
cv::imshow(display_win, show_image);
8593

8694
if (input_source == INPUT_Image)
@@ -144,41 +152,42 @@ int main(int argc, char **argv) {
144152
&option_index);
145153

146154
/* Detect the end of the options. */
147-
if (c == -1) break;
155+
if (c == -1)
156+
break;
148157

149158
switch (c) {
150-
case 'b':
151-
input_mean = strtod(optarg, nullptr);
152-
break;
153-
case 'c':
154-
frame_cnt = strtol(optarg, nullptr, 10);
155-
break;
156-
case 'i':
157-
input_path = optarg;
158-
break;
159-
case 'l':
160-
label_path = optarg;
161-
break;
162-
case 'm':
163-
model_path = optarg;
164-
break;
165-
case 'p':
166-
profiling = strtol(optarg, nullptr, 10);
167-
break;
168-
case 'r':
169-
input_source = (eInputType)strtol(optarg, nullptr, 10);
170-
break;
171-
case 's':
172-
input_std = strtod(optarg, nullptr);
173-
break;
174-
case 't':
175-
num_threads = strtol(optarg, nullptr, 10);
176-
break;
177-
case 'h':
178-
display_usage();
179-
exit(-1);
180-
default:
181-
exit(-1);
159+
case 'b':
160+
input_mean = strtod(optarg, nullptr);
161+
break;
162+
case 'c':
163+
frame_cnt = strtol(optarg, nullptr, 10);
164+
break;
165+
case 'i':
166+
input_path = optarg;
167+
break;
168+
case 'l':
169+
label_path = optarg;
170+
break;
171+
case 'm':
172+
model_path = optarg;
173+
break;
174+
case 'p':
175+
profiling = strtol(optarg, nullptr, 10);
176+
break;
177+
case 'r':
178+
input_source = (eInputType)strtol(optarg, nullptr, 10);
179+
break;
180+
case 's':
181+
input_std = strtod(optarg, nullptr);
182+
break;
183+
case 't':
184+
num_threads = strtol(optarg, nullptr, 10);
185+
break;
186+
case 'h':
187+
display_usage();
188+
exit(-1);
189+
default:
190+
exit(-1);
182191
}
183192
}
184193

@@ -266,28 +275,27 @@ int main(int argc, char **argv) {
266275
// Prepare the input for the inference
267276
int input = interpreter->inputs()[0];
268277
switch (interpreter->tensor(input)->type) {
269-
case kTfLiteFloat32:
270-
std::cout << "kTfLiteFloat32" << std::endl;
271-
PrepareInput<float>(interpreter->typed_tensor<float>(input),
272-
input_frame, input_number_of_pixels, true,
278+
case kTfLiteFloat32:
279+
std::cout << "kTfLiteFloat32" << std::endl;
280+
PrepareInput<float>(interpreter->typed_tensor<float>(input), input_frame,
281+
input_number_of_pixels, true, input_mean, input_std);
282+
break;
283+
case kTfLiteUInt8:
284+
std::cout << "kTfLiteUInt8" << std::endl;
285+
PrepareInput<uint8_t>(interpreter->typed_tensor<uint8_t>(input),
286+
input_frame, input_number_of_pixels, false,
273287
input_mean, input_std);
274-
break;
275-
case kTfLiteUInt8:
276-
std::cout << "kTfLiteUInt8" << std::endl;
277-
PrepareInput<uint8_t>(interpreter->typed_tensor<uint8_t>(input),
278-
input_frame, input_number_of_pixels, false,
279-
input_mean, input_std);
280-
break;
281-
case kTfLiteInt8:
282-
std::cout << "kTfLiteInt8" << std::endl;
283-
PrepareInput<int8_t>(interpreter->typed_tensor<int8_t>(input),
284-
input_frame, input_number_of_pixels, false,
285-
input_mean, input_std);
286-
break;
287-
default:
288-
cout << "cannot handle input type " << interpreter->tensor(input)->type
289-
<< " yet" << std::endl;
290-
exit(-1);
288+
break;
289+
case kTfLiteInt8:
290+
std::cout << "kTfLiteInt8" << std::endl;
291+
PrepareInput<int8_t>(interpreter->typed_tensor<int8_t>(input),
292+
input_frame, input_number_of_pixels, false,
293+
input_mean, input_std);
294+
break;
295+
default:
296+
cout << "cannot handle input type " << interpreter->tensor(input)->type
297+
<< " yet" << std::endl;
298+
exit(-1);
291299
}
292300

293301
// Running the inference
Binary file not shown.
Binary file not shown.

detection.cc

Lines changed: 35 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ using namespace std;
4949
*/
5050
void display_usage() {
5151
std:
52-
cout << "tflite_segmentation\n"
52+
cout << "tflite_detection\n"
5353
<< "--tflite_model, -m: model_name.tflite\n"
5454
<< "--label_file, -l: label_file\n"
5555
<< "--input_src, -r: [0|1|2] input source: image 0, video 1, camera 2\n"
@@ -99,41 +99,42 @@ int main(int argc, char **argv) {
9999
&option_index);
100100

101101
/* Detect the end of the options. */
102-
if (c == -1) break;
102+
if (c == -1)
103+
break;
103104

104105
switch (c) {
105-
case 'b':
106-
input_mean = strtod(optarg, nullptr);
107-
break;
108-
case 'c':
109-
frame_cnt = strtol(optarg, nullptr, 10);
110-
break;
111-
case 'i':
112-
input_path = optarg;
113-
break;
114-
case 'm':
115-
model_path = optarg;
116-
break;
117-
case 'l':
118-
label_path = optarg;
119-
break;
120-
case 'p':
121-
profiling = strtol(optarg, nullptr, 10);
122-
break;
123-
case 'r':
124-
input_source = (eInputType)strtol(optarg, nullptr, 10);
125-
break;
126-
case 's':
127-
input_std = strtod(optarg, nullptr);
128-
break;
129-
case 't':
130-
num_threads = strtol(optarg, nullptr, 10);
131-
break;
132-
case 'h':
133-
display_usage();
134-
exit(-1);
135-
default:
136-
exit(-1);
106+
case 'b':
107+
input_mean = strtod(optarg, nullptr);
108+
break;
109+
case 'c':
110+
frame_cnt = strtol(optarg, nullptr, 10);
111+
break;
112+
case 'i':
113+
input_path = optarg;
114+
break;
115+
case 'm':
116+
model_path = optarg;
117+
break;
118+
case 'l':
119+
label_path = optarg;
120+
break;
121+
case 'p':
122+
profiling = strtol(optarg, nullptr, 10);
123+
break;
124+
case 'r':
125+
input_source = (eInputType)strtol(optarg, nullptr, 10);
126+
break;
127+
case 's':
128+
input_std = strtod(optarg, nullptr);
129+
break;
130+
case 't':
131+
num_threads = strtol(optarg, nullptr, 10);
132+
break;
133+
case 'h':
134+
display_usage();
135+
exit(-1);
136+
default:
137+
exit(-1);
137138
}
138139
}
139140

scripts/cv/build-linux-riscv64-JDSK.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ CMAKE_ARGS+=("-DBUILD_TESTS=OFF")
2222
CMAKE_ARGS+=("-DBUILD_opencv_apps=OFF")
2323
CMAKE_ARGS+=("-DBUILD_opencv_calib3d=OFF")
2424
CMAKE_ARGS+=("-DBUILD_opencv_gapi=OFF")
25+
CMAKE_ARGS+=("-DWITH_PTHREADS_PF=OFF")
2526

2627
# install
2728
# CMAKE_INSTALL_PREFIX

0 commit comments

Comments
 (0)