.. automodule:: tensorlayer.prepro
.. autosummary:: affine_rotation_matrix affine_horizontal_flip_matrix affine_vertical_flip_matrix affine_shift_matrix affine_shear_matrix affine_zoom_matrix affine_respective_zoom_matrix transform_matrix_offset_center affine_transform affine_transform_cv2 affine_transform_keypoints projective_transform_by_points rotation rotation_multi crop crop_multi flip_axis flip_axis_multi shift shift_multi shear shear_multi shear2 shear_multi2 swirl swirl_multi elastic_transform elastic_transform_multi zoom respective_zoom zoom_multi brightness brightness_multi illumination rgb_to_hsv hsv_to_rgb adjust_hue imresize pixel_value_scale samplewise_norm featurewise_norm channel_shift channel_shift_multi drop array_to_img find_contours pt2map binary_dilation dilation binary_erosion erosion obj_box_coord_rescale obj_box_coords_rescale obj_box_coord_scale_to_pixelunit obj_box_coord_centroid_to_upleft_butright obj_box_coord_upleft_butright_to_centroid obj_box_coord_centroid_to_upleft obj_box_coord_upleft_to_centroid obj_box_coord_affine rotated_obj_box_coord_affine parse_darknet_ann_str_to_list parse_darknet_ann_list_to_cls_box obj_box_horizontal_flip obj_box_imresize obj_box_crop obj_box_shift obj_box_zoom keypoint_random_crop keypoint_resize_random_crop keypoint_random_rotate keypoint_random_flip keypoint_random_resize keypoint_random_resize_shortestedge pad_sequences remove_pad_sequences process_sequences sequences_add_start_id sequences_add_end_id sequences_add_end_id_after_pad sequences_get_mask
Image augmentation is a critical step in deep learning.
Though TensorFlow has provided tf.image
,
image augmentation often remains as a key bottleneck.
tf.image
has three limitations:
- Real-world visual tasks such as object detection, segmentation, and pose estimation
must cope with image meta-data (e.g., coordinates).
These data are beyond
tf.image
which processes images as tensors. tf.image
operators breaks the pure Python programing experience (i.e., users have to usetf.py_func
in order to call image functions written in Python); however, frequent uses oftf.py_func
slow down TensorFlow, making users hard to balance flexibility and performance.tf.image
API is inflexible. Image operations are performed in an order. They are hard to jointly optimize. More importantly, sequential image operations can significantly reduces the quality of images, thus affecting training accuracy.
TensorLayer addresses these limitations by providing a
high-performance image augmentation API in Python.
This API bases on affine transformation and cv2.wrapAffine
.
It allows you to combine multiple image processing functions into
a single matrix operation. This combined operation
is executed by the fast cv2
library, offering 78x performance improvement (observed in
openpose-plus for example).
The following example illustrates the rationale
behind this tremendous speed up.
The source code of complete examples can be found here. The following is a typical Python program that applies rotation, shifting, flipping, zooming and shearing to an image,
image = tl.vis.read_image('tiger.jpeg')
xx = tl.prepro.rotation(image, rg=-20, is_random=False)
xx = tl.prepro.flip_axis(xx, axis=1, is_random=False)
xx = tl.prepro.shear2(xx, shear=(0., -0.2), is_random=False)
xx = tl.prepro.zoom(xx, zoom_range=0.8)
xx = tl.prepro.shift(xx, wrg=-0.1, hrg=0, is_random=False)
tl.vis.save_image(xx, '_result_slow.png')
However, by leveraging affine transformation, image operations can be combined into one:
# 1. Create required affine transformation matrices
M_rotate = tl.prepro.affine_rotation_matrix(angle=20)
M_flip = tl.prepro.affine_horizontal_flip_matrix(prob=1)
M_shift = tl.prepro.affine_shift_matrix(wrg=0.1, hrg=0, h=h, w=w)
M_shear = tl.prepro.affine_shear_matrix(x_shear=0.2, y_shear=0)
M_zoom = tl.prepro.affine_zoom_matrix(zoom_range=0.8)
# 2. Combine matrices
# NOTE: operations are applied in a reversed order (i.e., rotation is performed first)
M_combined = M_shift.dot(M_zoom).dot(M_shear).dot(M_flip).dot(M_rotate)
# 3. Convert the matrix from Cartesian coordinates (the origin in the middle of image)
# to image coordinates (the origin on the top-left of image)
transform_matrix = tl.prepro.transform_matrix_offset_center(M_combined, x=w, y=h)
# 4. Transform the image using a single operation
result = tl.prepro.affine_transform_cv2(image, transform_matrix) # 76 times faster
tl.vis.save_image(result, '_result_fast.png')
The following figure illustrates the rational behind combined affine transformation.
Using combined affine transformation has two key benefits. First, it allows you to leverage a pure Python API to achieve orders of magnitudes of speed up in image augmentation, and thus prevent data pre-processing from becoming a bottleneck in training. Second, performing sequential image transformation requires multiple image interpolations. This produces low-quality input images. In contrast, a combined transformation performs the interpolation only once, and thus preserve the content in an image. The following figure illustrates these two benefits:
The major reason for combined affine transformation being fast is because it has lower computational complexity.
Assume we have k
affine transformations T1, ..., Tk
, where Ti
can be represented by 3x3 matrixes.
The sequential transformation can be represented as y = Tk (... T1(x))
,
and the time complexity is O(k N)
where N
is the cost of applying one transformation to image x
.
N
is linear to the size of x
.
For the combined transformation y = (Tk ... T1) (x)
the time complexity is O(27(k - 1) + N) = max{O(27k), O(N)} = O(N)
(assuming 27k << N) where 27 = 3^3 is the cost for combining two transformations.
.. autofunction:: affine_rotation_matrix
.. autofunction:: affine_horizontal_flip_matrix
.. autofunction:: affine_vertical_flip_matrix
.. autofunction:: affine_shift_matrix
.. autofunction:: affine_shear_matrix
.. autofunction:: affine_zoom_matrix
.. autofunction:: affine_respective_zoom_matrix
.. autofunction:: transform_matrix_offset_center
.. autofunction:: affine_transform_cv2
.. autofunction:: affine_transform_keypoints
.. autofunction:: projective_transform_by_points
.. autofunction:: rotation
.. autofunction:: rotation_multi
.. autofunction:: crop
.. autofunction:: crop_multi
.. autofunction:: flip_axis
.. autofunction:: flip_axis_multi
.. autofunction:: shift
.. autofunction:: shift_multi
.. autofunction:: shear
.. autofunction:: shear_multi
.. autofunction:: shear2
.. autofunction:: shear_multi2
.. autofunction:: swirl
.. autofunction:: swirl_multi
.. autofunction:: elastic_transform
.. autofunction:: elastic_transform_multi
.. autofunction:: zoom
.. autofunction:: zoom_multi
.. autofunction:: respective_zoom
.. autofunction:: brightness
.. autofunction:: brightness_multi
.. autofunction:: illumination
.. autofunction:: rgb_to_hsv
.. autofunction:: hsv_to_rgb
.. autofunction:: adjust_hue
.. autofunction:: imresize
.. autofunction:: pixel_value_scale
.. autofunction:: samplewise_norm
.. autofunction:: featurewise_norm
.. autofunction:: channel_shift
.. autofunction:: channel_shift_multi
.. autofunction:: drop
.. autofunction:: array_to_img
.. autofunction:: find_contours
.. autofunction:: pt2map
.. autofunction:: binary_dilation
.. autofunction:: dilation
.. autofunction:: binary_erosion
.. autofunction:: erosion
Hi, here is an example for image augmentation on VOC dataset.
import tensorlayer as tl
## download VOC 2012 dataset
imgs_file_list, _, _, _, classes, _, _,\
_, objs_info_list, _ = tl.files.load_voc_dataset(dataset="2012")
## parse annotation and convert it into list format
ann_list = []
for info in objs_info_list:
ann = tl.prepro.parse_darknet_ann_str_to_list(info)
c, b = tl.prepro.parse_darknet_ann_list_to_cls_box(ann)
ann_list.append([c, b])
# read and save one image
idx = 2 # you can select your own image
image = tl.vis.read_image(imgs_file_list[idx])
tl.vis.draw_boxes_and_labels_to_image(image, ann_list[idx][0],
ann_list[idx][1], [], classes, True, save_name='_im_original.png')
# left right flip
im_flip, coords = tl.prepro.obj_box_horizontal_flip(image,
ann_list[idx][1], is_rescale=True, is_center=True, is_random=False)
tl.vis.draw_boxes_and_labels_to_image(im_flip, ann_list[idx][0],
coords, [], classes, True, save_name='_im_flip.png')
# resize
im_resize, coords = tl.prepro.obj_box_imresize(image,
coords=ann_list[idx][1], size=[300, 200], is_rescale=True)
tl.vis.draw_boxes_and_labels_to_image(im_resize, ann_list[idx][0],
coords, [], classes, True, save_name='_im_resize.png')
# crop
im_crop, clas, coords = tl.prepro.obj_box_crop(image, ann_list[idx][0],
ann_list[idx][1], wrg=200, hrg=200,
is_rescale=True, is_center=True, is_random=False)
tl.vis.draw_boxes_and_labels_to_image(im_crop, clas, coords, [],
classes, True, save_name='_im_crop.png')
# shift
im_shfit, clas, coords = tl.prepro.obj_box_shift(image, ann_list[idx][0],
ann_list[idx][1], wrg=0.1, hrg=0.1,
is_rescale=True, is_center=True, is_random=False)
tl.vis.draw_boxes_and_labels_to_image(im_shfit, clas, coords, [],
classes, True, save_name='_im_shift.png')
# zoom
im_zoom, clas, coords = tl.prepro.obj_box_zoom(image, ann_list[idx][0],
ann_list[idx][1], zoom_range=(1.3, 0.7),
is_rescale=True, is_center=True, is_random=False)
tl.vis.draw_boxes_and_labels_to_image(im_zoom, clas, coords, [],
classes, True, save_name='_im_zoom.png')
In practice, you may want to use threading method to process a batch of images as follows.
import tensorlayer as tl
import random
batch_size = 64
im_size = [416, 416]
n_data = len(imgs_file_list)
jitter = 0.2
def _data_pre_aug_fn(data):
im, ann = data
clas, coords = ann
## change image brightness, contrast and saturation randomly
im = tl.prepro.illumination(im, gamma=(0.5, 1.5),
contrast=(0.5, 1.5), saturation=(0.5, 1.5), is_random=True)
## flip randomly
im, coords = tl.prepro.obj_box_horizontal_flip(im, coords,
is_rescale=True, is_center=True, is_random=True)
## randomly resize and crop image, it can have same effect as random zoom
tmp0 = random.randint(1, int(im_size[0]*jitter))
tmp1 = random.randint(1, int(im_size[1]*jitter))
im, coords = tl.prepro.obj_box_imresize(im, coords,
[im_size[0]+tmp0, im_size[1]+tmp1], is_rescale=True,
interp='bicubic')
im, clas, coords = tl.prepro.obj_box_crop(im, clas, coords,
wrg=im_size[1], hrg=im_size[0], is_rescale=True,
is_center=True, is_random=True)
## rescale value from [0, 255] to [-1, 1] (optional)
im = im / 127.5 - 1
return im, [clas, coords]
# randomly read a batch of image and the corresponding annotations
idexs = tl.utils.get_random_int(min=0, max=n_data-1, number=batch_size)
b_im_path = [imgs_file_list[i] for i in idexs]
b_images = tl.prepro.threading_data(b_im_path, fn=tl.vis.read_image)
b_ann = [ann_list[i] for i in idexs]
# threading process
data = tl.prepro.threading_data([_ for _ in zip(b_images, b_ann)],
_data_pre_aug_fn)
b_images2 = [d[0] for d in data]
b_ann = [d[1] for d in data]
# save all images
for i in range(len(b_images)):
tl.vis.draw_boxes_and_labels_to_image(b_images[i],
ann_list[idexs[i]][0], ann_list[idexs[i]][1], [],
classes, True, save_name='_bbox_vis_%d_original.png' % i)
tl.vis.draw_boxes_and_labels_to_image((b_images2[i]+1)*127.5,
b_ann[i][0], b_ann[i][1], [], classes, True,
save_name='_bbox_vis_%d.png' % i)
- Example code for VOC here.
.. autofunction:: obj_box_coord_rescale
.. autofunction:: obj_box_coords_rescale
.. autofunction:: obj_box_coord_scale_to_pixelunit
.. autofunction:: obj_box_coord_centroid_to_upleft_butright
.. autofunction:: obj_box_coord_upleft_butright_to_centroid
.. autofunction:: obj_box_coord_centroid_to_upleft
.. autofunction:: obj_box_coord_upleft_to_centroid
.. autofunction:: parse_darknet_ann_str_to_list
.. autofunction:: parse_darknet_ann_list_to_cls_box
.. autofunction:: obj_box_horizontal_flip
.. autofunction:: obj_box_imresize
.. autofunction:: obj_box_crop
.. autofunction:: obj_box_shift
.. autofunction:: obj_box_zoom
.. autofunction:: obj_box_coord_affine
.. autofunction:: rotated_obj_box_coord_affine
.. autofunction:: keypoint_random_crop
.. autofunction:: keypoint_resize_random_crop
.. autofunction:: keypoint_random_rotate
.. autofunction:: keypoint_random_flip
.. autofunction:: keypoint_random_resize
.. autofunction:: keypoint_random_resize_shortestedge
More related functions can be found in tensorlayer.nlp
.
.. autofunction:: pad_sequences
.. autofunction:: remove_pad_sequences
.. autofunction:: process_sequences
.. autofunction:: sequences_add_start_id
.. autofunction:: sequences_add_end_id
.. autofunction:: sequences_add_end_id_after_pad
.. autofunction:: sequences_get_mask