Skip to content

Commit d081b58

Browse files
committed
modifications
(1) add the letterbox resize method (2) kmeans script updated (3) annotation format changed (4) add gradient clipping
1 parent e7a314e commit d081b58

11 files changed

+157
-83
lines changed

README.md

+11-11
Original file line numberDiff line numberDiff line change
@@ -86,13 +86,13 @@ For better understanding of the model architecture, you can refer to the followi
8686

8787
(1) annotation file
8888

89-
Generate `train.txt/val.txt/test.txt` files under `./data/my_data/` directory. One line for one image, in the format like `image_index image_absolute_path box_1 box_2 ... box_n`. Box_x format: `label_index x_min y_min x_max y_max`. (The origin of coordinates is at the left top corner, left top => (xmin, ymin), right bottom => (xmax, ymax).) `image_index` is the line index which starts from zero. `label_index` is in range [0, class_num - 1].
89+
Generate `train.txt/val.txt/test.txt` files under `./data/my_data/` directory. One line for one image, in the format like `image_index image_absolute_path img_width img_height box_1 box_2 ... box_n`. Box_x format: `label_index x_min y_min x_max y_max`. (The origin of coordinates is at the left top corner, left top => (xmin, ymin), right bottom => (xmax, ymax).) `image_index` is the line index which starts from zero. `label_index` is in range [0, class_num - 1].
9090

9191
For example:
9292

9393
```
94-
0 xxx/xxx/a.jpg 0 453 369 473 391 1 588 245 608 268
95-
1 xxx/xxx/b.jpg 1 466 403 485 422 2 793 300 809 320
94+
0 xxx/xxx/a.jpg 300 400 0 453 369 473 391 1 588 245 608 268
95+
1 xxx/xxx/b.jpg 300 400 1 466 403 485 422 2 793 300 809 320
9696
...
9797
```
9898

@@ -123,7 +123,7 @@ Then you will get 9 anchors and the average IoU. Save the anchors to a txt file.
123123

124124
The COCO dataset anchors offered by YOLO's author is placed at `./data/yolo_anchors.txt`, you can use that one too.
125125

126-
**NOTE: The yolo anchors computed by the kmeans script is on the original image scale. You may need to resize the anchors to your target training image size before training and write them to the anchors txt file. Then you should not modify the anchors later.**
126+
The yolo anchors computed by the kmeans script is on the resized image scale. The default resize method is the letterbox resize, i.e., keep the original aspect ratio in the resized image.
127127

128128
#### 7.2 Training
129129

@@ -164,19 +164,19 @@ For higher mAP, you should set score_threshold to a small number.
164164
165165
Here are some training tricks in my experiment:
166166
167-
(1) Apply the two-stage training strategy:
167+
(1) Apply the two-stage training strategy or the one-stage training strategy:
168+
169+
Two-stage training:
168170
169171
First stage: Restore `darknet53_body` part weights from COCO checkpoints, train the `yolov3_head` with big learning rate like 1e-3 until the loss reaches to a low level.
170172
171173
Second stage: Restore the weights from the first stage, then train the whole model with small learning rate like 1e-4 or smaller. At this stage remember to restore the optimizer parameters if you use optimizers like adam.
172174
173-
Or just restore the whole weight file except the last three convolution layers.
174-
175-
(2) Quick train:
175+
One-stage training:
176176
177-
If you want to obtain acceptable results in a short time like in 10 minutes. You can use the coco names but substitute several with real class names in your dataset. In this way you restore the whole pretrained COCO model and get a 80 class classification model, but you only care about the class names from your dataset.
177+
Just restore the whole weight file except the last three convolution layers (Conv_6, Conv_14, Conv_22). In this condition, be careful about the possible nan loss value.
178178
179-
(3) I've included many useful training strategies in `args.py`:
179+
(2) I've included many useful training strategies in `args.py`:
180180
181181
- Cosine decay of lr (SGDR)
182182
- Multi-scale training
@@ -188,7 +188,7 @@ These are all good strategies but it does **not** mean they will definitely impr
188188
189189
This [paper](https://arxiv.org/abs/1902.04103) from gluon-cv has proved that data augmentation is critical to YOLO v3, which is completely in consistent with my own experiments. Some data augmentation strategies that seems reasonable may lead to poor performance. For example, after introducing random color jittering, the mAP on my own dataset drops heavily. Thus I hope you pay extra attention to the data augmentation.
190190
191-
(4) Loss nan? Setting a bigger warm_up_epoch number or less learning rate and try several more times. If you fine-tune the whole model, using adam may cause nan value sometimes. You can try choosing momentum optimizer.
191+
(4) Loss nan? Setting a bigger warm_up_epoch number or smaller learning rate and try several more times. If you fine-tune the whole model, using adam may cause nan value sometimes. You can try choosing momentum optimizer.
192192
193193
### 10. TODO
194194

args.py

+5-3
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,10 @@
1919
### Training releated numbers
2020
batch_size = 20
2121
img_size = [416, 416] # Images will be resized to `img_size` and fed to the network, size format: [width, height]
22+
letterbox_resize = False # Whether to use the letterbox resize, i.e., keep the original aspect ratio in the resized image.
2223
total_epoches = 200
2324
train_evaluation_step = 100 # Evaluate on the training batch after some steps.
24-
val_evaluation_epoch = 1 # Evaluate on the whole validation dataset after some steps. Set to None to evaluate every epoch.
25+
val_evaluation_epoch = 1 # Evaluate on the whole validation dataset after some epochs. Set to None to evaluate every epoch.
2526
save_epoch = 10 # Save the model after some epochs.
2627
batch_norm_decay = 0.99 # decay in bn ops
2728
weight_decay = 5e-4 # l2 weight decay
@@ -32,10 +33,10 @@
3233
prefetech_buffer = 5 # Prefetech_buffer used in tf.data pipeline.
3334

3435
### Learning rate and optimizer
35-
optimizer_name = 'adam' # Chosen from [sgd, momentum, adam, rmsprop]
36+
optimizer_name = 'momentum' # Chosen from [sgd, momentum, adam, rmsprop]
3637
save_optimizer = True # Whether to save the optimizer parameters into the checkpoint file.
3738
learning_rate_init = 1e-3
38-
lr_type = 'exponential' # Chosen from [fixed, exponential, cosine_decay, cosine_decay_restart, piecewise]
39+
lr_type = 'piecewise' # Chosen from [fixed, exponential, cosine_decay, cosine_decay_restart, piecewise]
3940
lr_decay_epoch = 5 # Epochs after which learning rate decays. Int or float. Used when chosen `exponential` and `cosine_decay_restart` lr_type.
4041
lr_decay_factor = 0.96 # The learning rate decay factor. Used when chosen `exponential` lr_type.
4142
lr_lower_bound = 1e-6 # The minimum learning rate.
@@ -73,6 +74,7 @@
7374
nms_topk = 150 # keep at most nms_topk outputs after nms
7475
# mAP eval
7576
eval_threshold = 0.5 # the iou threshold applied in mAP evaluation
77+
use_voc_07_metric = True # whether to use voc 2007 evaluation metric, i.e. the 11-point metric
7678

7779
### parse some params
7880
anchors = parse_anchors(anchor_path)

eval.py

+10-4
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
parser.add_argument("--eval_file", type=str, default="./data/my_data/val.txt",
2323
help="The path of the validation or test txt file.")
2424

25-
parser.add_argument("--restore_path", type=str, default="/home/user/Documents/chenyang_projects/yolo_v3_voc/YOLOv3_TensorFlow_old/No_data_aug_bn_0.9/best_model_Epoch_52_step_43200.0_mAP_0.6752_loss_20.220579_lr_0.0005882013",
25+
parser.add_argument("--restore_path", type=str, default="./data/darknet_weights/yolov3.ckpt",
2626
help="The path of the weights to restore.")
2727

2828
parser.add_argument("--anchor_path", type=str, default="./data/yolo_anchors.txt",
@@ -35,6 +35,9 @@
3535
parser.add_argument("--img_size", nargs='*', type=int, default=[416, 416],
3636
help="Resize the input image to `img_size`, size format: [width, height]")
3737

38+
parser.add_argument("--letterbox_resize", type=lambda x: (str(x).lower() == 'true'), default=False,
39+
help="Whether to use the letterbox resize.")
40+
3841
parser.add_argument("--num_threads", type=int, default=10,
3942
help="Number of threads for image processing used in tf.data pipeline.")
4043

@@ -50,6 +53,9 @@
5053
parser.add_argument("--nms_topk", type=int, default=400,
5154
help="Keep at most nms_topk outputs after nms.")
5255

56+
parser.add_argument("--use_voc_07_metric", type=lambda x: (str(x).lower() == 'true'), default=True,
57+
help="Whether to use the voc 2007 mAP metrics.")
58+
5359
args = parser.parse_args()
5460

5561
# args params
@@ -71,7 +77,7 @@
7177
val_dataset = tf.data.TextLineDataset(args.eval_file)
7278
val_dataset = val_dataset.batch(1)
7379
val_dataset = val_dataset.map(
74-
lambda x: tf.py_func(get_batch_data, [x, args.class_num, args.img_size, args.anchors, 'val'], [tf.int64, tf.float32, tf.float32, tf.float32, tf.float32]),
80+
lambda x: tf.py_func(get_batch_data, [x, args.class_num, args.img_size, args.anchors, 'val', False, False, args.letterbox_resize], [tf.int64, tf.float32, tf.float32, tf.float32, tf.float32]),
7581
num_parallel_calls=args.num_threads
7682
)
7783
val_dataset.prefetch(args.prefetech_buffer)
@@ -117,10 +123,10 @@
117123
val_loss_class.update(__loss[4])
118124

119125
rec_total, prec_total, ap_total = AverageMeter(), AverageMeter(), AverageMeter()
120-
gt_dict = parse_gt_rec(args.eval_file, args.img_size)
126+
gt_dict = parse_gt_rec(args.eval_file, args.img_size, args.letterbox_resize)
121127
print('mAP eval:')
122128
for ii in range(args.class_num):
123-
npos, nd, rec, prec, ap = voc_eval(gt_dict, val_preds, ii, iou_thres=0.5, use_07_metric=False)
129+
npos, nd, rec, prec, ap = voc_eval(gt_dict, val_preds, ii, iou_thres=0.5, use_07_metric=args.use_voc_07_metric)
124130
rec_total.update(rec, npos)
125131
prec_total.update(prec, nd)
126132
ap_total.update(ap, 1)

get_kmeans.py

+22-6
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,8 @@ def iou(box, clusters):
2323
box_area = box[0] * box[1]
2424
cluster_area = clusters[:, 0] * clusters[:, 1]
2525

26-
iou_ = intersection / (box_area + cluster_area - intersection + 1e-10)
26+
iou_ = np.true_divide(intersection, box_area + cluster_area - intersection + 1e-10)
27+
# iou_ = intersection / (box_area + cluster_area - intersection + 1e-10)
2728

2829
return iou_
2930

@@ -92,20 +93,31 @@ def kmeans(boxes, k, dist=np.median):
9293
return clusters
9394

9495

95-
def parse_anno(annotation_path):
96+
def parse_anno(annotation_path, target_size=None):
9697
anno = open(annotation_path, 'r')
9798
result = []
9899
for line in anno:
99100
s = line.strip().split(' ')
100-
s = s[2:]
101+
img_w = int(s[2])
102+
img_h = int(s[3])
103+
s = s[4:]
101104
box_cnt = len(s) // 5
102105
for i in range(box_cnt):
103106
x_min, y_min, x_max, y_max = float(s[i*5+1]), float(s[i*5+2]), float(s[i*5+3]), float(s[i*5+4])
104107
width = x_max - x_min
105108
height = y_max - y_min
106109
assert width > 0
107110
assert height > 0
108-
result.append([width, height])
111+
# use letterbox resize, i.e. keep the original aspect ratio
112+
# get k-means anchors on the resized target image size
113+
if target_size is not None:
114+
resize_ratio = min(target_size[0] / img_w, target_size[1] / img_h)
115+
width *= resize_ratio
116+
height *= resize_ratio
117+
result.append([width, height])
118+
# get k-means anchors on the original image size
119+
else:
120+
result.append([width, height])
109121
result = np.asarray(result)
110122
return result
111123

@@ -123,8 +135,12 @@ def get_kmeans(anno, cluster_num=9):
123135

124136

125137
if __name__ == '__main__':
126-
annotation_path = "./data/my_data/train.txt"
127-
anno_result = parse_anno(annotation_path)
138+
# target resize format: [width, height]
139+
# if target_resize is speficied, the anchors are on the resized image scale
140+
# if target_resize is set to None, the anchors are on the original image scale
141+
target_size = [416, 416]
142+
annotation_path = "train.txt"
143+
anno_result = parse_anno(annotation_path, target_size=target_size)
128144
anchors, ave_iou = get_kmeans(anno_result, 9)
129145

130146
anchor_string = ''

model.py

+2
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,8 @@ def loss_layer(self, feature_map_i, y_true, anchors):
215215
# [N, 13, 13, 3, 1]
216216
object_mask = y_true[..., 4:5]
217217

218+
# the calculation of ignore mask if referred from
219+
# https://github.com/pjreddie/darknet/blob/master/src/yolo_layer.c#L179
218220
ignore_mask = tf.TensorArray(tf.float32, size=0, dynamic_size=True)
219221
def loop_cond(idx, ignore_mask):
220222
return tf.less(idx, tf.cast(N, tf.int32))

test_single_image.py

+14-6
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
from utils.misc_utils import parse_anchors, read_class_names
1111
from utils.nms_utils import gpu_nms
1212
from utils.plot_utils import get_color_table, plot_one_box
13+
from utils.data_aug import letterbox_resize
1314

1415
from model import yolov3
1516

@@ -20,6 +21,8 @@
2021
help="The path of the anchor txt file.")
2122
parser.add_argument("--new_size", nargs='*', type=int, default=[416, 416],
2223
help="Resize the input image with `new_size`, size format: [width, height]")
24+
parser.add_argument("--letterbox_resize", type=lambda x: (str(x).lower() == 'true'), default=False,
25+
help="Whether to use the letterbox resize.")
2326
parser.add_argument("--class_name_path", type=str, default="./data/coco.names",
2427
help="The path of the class names.")
2528
parser.add_argument("--restore_path", type=str, default="./data/darknet_weights/yolov3.ckpt",
@@ -33,8 +36,11 @@
3336
color_table = get_color_table(args.num_class)
3437

3538
img_ori = cv2.imread(args.input_image)
36-
height_ori, width_ori = img_ori.shape[:2]
37-
img = cv2.resize(img_ori, tuple(args.new_size))
39+
if args.letterbox_resize:
40+
img, resize_ratio, dw, dh = letterbox_resize(img_ori, args.new_size[0], args.new_size[1])
41+
else:
42+
height_ori, width_ori = img_ori.shape[:2]
43+
img = cv2.resize(img_ori, tuple(args.new_size))
3844
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
3945
img = np.asarray(img, np.float32)
4046
img = img[np.newaxis, :] / 255.
@@ -56,10 +62,12 @@
5662
boxes_, scores_, labels_ = sess.run([boxes, scores, labels], feed_dict={input_data: img})
5763

5864
# rescale the coordinates to the original image
59-
boxes_[:, 0] *= (width_ori/float(args.new_size[0]))
60-
boxes_[:, 2] *= (width_ori/float(args.new_size[0]))
61-
boxes_[:, 1] *= (height_ori/float(args.new_size[1]))
62-
boxes_[:, 3] *= (height_ori/float(args.new_size[1]))
65+
if args.letterbox_resize:
66+
boxes_[:, [0, 2]] = (boxes_[:, [0, 2]] - dw) / resize_ratio
67+
boxes_[:, [1, 3]] = (boxes_[:, [1, 3]] - dh) / resize_ratio
68+
else:
69+
boxes_[:, [0, 2]] *= (width_ori/float(args.new_size[0]))
70+
boxes_[:, [1, 3]] *= (height_ori/float(args.new_size[1]))
6371

6472
print("box coords:")
6573
print(boxes_)

train.py

+11-6
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
train_dataset = train_dataset.batch(args.batch_size)
3737
train_dataset = train_dataset.map(
3838
lambda x: tf.py_func(get_batch_data,
39-
inp=[x, args.class_num, args.img_size, args.anchors, 'train', args.multi_scale_train, args.use_mix_up],
39+
inp=[x, args.class_num, args.img_size, args.anchors, 'train', args.multi_scale_train, args.use_mix_up, args.letterbox_resize],
4040
Tout=[tf.int64, tf.float32, tf.float32, tf.float32, tf.float32]),
4141
num_parallel_calls=args.num_threads
4242
)
@@ -46,7 +46,7 @@
4646
val_dataset = val_dataset.batch(1)
4747
val_dataset = val_dataset.map(
4848
lambda x: tf.py_func(get_batch_data,
49-
inp=[x, args.class_num, args.img_size, args.anchors, 'val', False, False],
49+
inp=[x, args.class_num, args.img_size, args.anchors, 'val', False, False, args.letterbox_resize],
5050
Tout=[tf.int64, tf.float32, tf.float32, tf.float32, tf.float32]),
5151
num_parallel_calls=args.num_threads
5252
)
@@ -107,7 +107,12 @@
107107
# set dependencies for BN ops
108108
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
109109
with tf.control_dependencies(update_ops):
110-
train_op = optimizer.minimize(loss[0] + l2_loss, var_list=update_vars, global_step=global_step)
110+
# train_op = optimizer.minimize(loss[0] + l2_loss, var_list=update_vars, global_step=global_step)
111+
# apple gradient clip to avoid gradient exploding
112+
gvs = optimizer.compute_gradients(loss[0] + l2_loss, var_list=update_vars)
113+
clip_grad_var = [gv if gv[0] is None else [
114+
tf.clip_by_norm(gv[0], 50.), gv[1]] for gv in gvs]
115+
train_op = optimizer.apply_gradients(clip_grad_var, global_step=global_step)
111116

112117
if args.save_optimizer:
113118
print('Saving optimizer parameters to checkpoint! Remember to restore the global_step in the fine-tuning afterwards.')
@@ -166,7 +171,7 @@
166171
saver_to_save.save(sess, args.save_dir + 'model-epoch_{}_step_{}_loss_{:.4f}_lr_{:.5g}'.format(epoch, int(__global_step), loss_total.average, __lr))
167172

168173
# switch to validation dataset for evaluation
169-
if epoch % args.val_evaluation_epoch == 0 and epoch > 0:
174+
if epoch % args.val_evaluation_epoch == 0 and epoch >= args.warm_up_epoch:
170175
sess.run(val_init_op)
171176

172177
val_loss_total, val_loss_xy, val_loss_wh, val_loss_conf, val_loss_class = \
@@ -187,12 +192,12 @@
187192

188193
# calc mAP
189194
rec_total, prec_total, ap_total = AverageMeter(), AverageMeter(), AverageMeter()
190-
gt_dict = parse_gt_rec(args.val_file, args.img_size)
195+
gt_dict = parse_gt_rec(args.val_file, args.img_size, args.letterbox_resize)
191196

192197
info = '======> Epoch: {}, global_step: {}, lr: {:.6g} <======\n'.format(epoch, __global_step, __lr)
193198

194199
for ii in range(args.class_num):
195-
npos, nd, rec, prec, ap = voc_eval(gt_dict, val_preds, ii, iou_thres=args.eval_threshold, use_07_metric=False)
200+
npos, nd, rec, prec, ap = voc_eval(gt_dict, val_preds, ii, iou_thres=args.eval_threshold, use_07_metric=args.use_voc_07_metric)
196201
info += 'EVAL: Class {}: Recall: {:.4f}, Precision: {:.4f}, AP: {:.4f}\n'.format(ii, rec, prec, ap)
197202
rec_total.update(rec, npos)
198203
prec_total.update(prec, nd)

utils/data_aug.py

+24-18
Original file line numberDiff line numberDiff line change
@@ -271,24 +271,35 @@ def random_brightness(img, brightness_delta, p=0.5):
271271
return img
272272

273273

274-
def resize_with_bbox(img, bbox, new_width, new_height, interp=0, letterbox=False):
274+
def letterbox_resize(img, new_width, new_height, interp=0):
275275
'''
276-
Resize the image and correct the bbox accordingly.
276+
Letterbox resize. keep the original aspect ratio in the resized image.
277277
'''
278278
ori_height, ori_width = img.shape[:2]
279279

280-
if letterbox:
281-
resize_ratio = min(new_width / ori_width, new_height / ori_height)
282-
resize_w = int(resize_ratio * ori_width)
283-
resize_h = int(resize_ratio * ori_height)
284-
img = cv2.resize(img, (resize_w, resize_h), interpolation=interp)
280+
resize_ratio = min(new_width / ori_width, new_height / ori_height)
281+
282+
resize_w = int(resize_ratio * ori_width)
283+
resize_h = int(resize_ratio * ori_height)
284+
285+
img = cv2.resize(img, (resize_w, resize_h), interpolation=interp)
286+
image_padded = np.full((new_height, new_width, 3), 128, np.uint8)
287+
288+
dw = int((new_width - resize_w) / 2)
289+
dh = int((new_height - resize_h) / 2)
285290

286-
image_padded = np.full((new_height, new_width, 3), 128, np.uint8)
291+
image_padded[dh: resize_h + dh, dw: resize_w + dw, :] = img
287292

288-
dw = int((new_width - resize_w) / 2)
289-
dh = int((new_height - resize_h) / 2)
293+
return image_padded, resize_ratio, dw, dh
290294

291-
image_padded[dh: resize_h + dh, dw: resize_w + dw, :] = img
295+
296+
def resize_with_bbox(img, bbox, new_width, new_height, interp=0, letterbox=False):
297+
'''
298+
Resize the image and correct the bbox accordingly.
299+
'''
300+
301+
if letterbox:
302+
image_padded, resize_ratio, dw, dh = letterbox_resize(img, new_width, new_height, interp)
292303

293304
# xmin, xmax
294305
bbox[:, [0, 2]] = bbox[:, [0, 2]] * resize_ratio + dw
@@ -297,6 +308,8 @@ def resize_with_bbox(img, bbox, new_width, new_height, interp=0, letterbox=False
297308

298309
return image_padded, bbox
299310
else:
311+
ori_height, ori_width = img.shape[:2]
312+
300313
img = cv2.resize(img, (new_width, new_height), interpolation=interp)
301314

302315
# xmin, xmax
@@ -365,10 +378,3 @@ def random_expand(img, bbox, max_ratio=4, fill=0, keep_ratio=True):
365378
bbox[:, 2:4] += (off_x, off_y)
366379

367380
return dst, bbox
368-
369-
370-
371-
372-
373-
374-

0 commit comments

Comments
 (0)