Skip to content

Latest commit

 

History

History
651 lines (492 loc) · 17.5 KB

prepro.rst

File metadata and controls

651 lines (492 loc) · 17.5 KB

API - Data Pre-Processing

.. automodule:: tensorlayer.prepro

.. autosummary::

   affine_rotation_matrix
   affine_horizontal_flip_matrix
   affine_vertical_flip_matrix
   affine_shift_matrix
   affine_shear_matrix
   affine_zoom_matrix
   affine_respective_zoom_matrix

   transform_matrix_offset_center
   affine_transform
   affine_transform_cv2
   affine_transform_keypoints
   projective_transform_by_points

   rotation
   rotation_multi
   crop
   crop_multi
   flip_axis
   flip_axis_multi
   shift
   shift_multi

   shear
   shear_multi
   shear2
   shear_multi2
   swirl
   swirl_multi
   elastic_transform
   elastic_transform_multi

   zoom
   respective_zoom
   zoom_multi

   brightness
   brightness_multi

   illumination

   rgb_to_hsv
   hsv_to_rgb
   adjust_hue

   imresize

   pixel_value_scale

   samplewise_norm
   featurewise_norm

   channel_shift
   channel_shift_multi

   drop

   array_to_img

   find_contours
   pt2map
   binary_dilation
   dilation
   binary_erosion
   erosion


   obj_box_coord_rescale
   obj_box_coords_rescale
   obj_box_coord_scale_to_pixelunit
   obj_box_coord_centroid_to_upleft_butright
   obj_box_coord_upleft_butright_to_centroid
   obj_box_coord_centroid_to_upleft
   obj_box_coord_upleft_to_centroid
   obj_box_coord_affine
   rotated_obj_box_coord_affine

   parse_darknet_ann_str_to_list
   parse_darknet_ann_list_to_cls_box

   obj_box_horizontal_flip
   obj_box_imresize
   obj_box_crop
   obj_box_shift
   obj_box_zoom

   keypoint_random_crop
   keypoint_resize_random_crop
   keypoint_random_rotate
   keypoint_random_flip
   keypoint_random_resize
   keypoint_random_resize_shortestedge

   pad_sequences
   remove_pad_sequences
   process_sequences
   sequences_add_start_id
   sequences_add_end_id
   sequences_add_end_id_after_pad
   sequences_get_mask


Affine Transform

Python can be FAST

Image augmentation is a critical step in deep learning. Though TensorFlow has provided tf.image, image augmentation often remains as a key bottleneck. tf.image has three limitations:

  • Real-world visual tasks such as object detection, segmentation, and pose estimation must cope with image meta-data (e.g., coordinates). These data are beyond tf.image which processes images as tensors.
  • tf.image operators breaks the pure Python programing experience (i.e., users have to use tf.py_func in order to call image functions written in Python); however, frequent uses of tf.py_func slow down TensorFlow, making users hard to balance flexibility and performance.
  • tf.image API is inflexible. Image operations are performed in an order. They are hard to jointly optimize. More importantly, sequential image operations can significantly reduces the quality of images, thus affecting training accuracy.

TensorLayer addresses these limitations by providing a high-performance image augmentation API in Python. This API bases on affine transformation and cv2.wrapAffine. It allows you to combine multiple image processing functions into a single matrix operation. This combined operation is executed by the fast cv2 library, offering 78x performance improvement (observed in openpose-plus for example). The following example illustrates the rationale behind this tremendous speed up.

Example

The source code of complete examples can be found here. The following is a typical Python program that applies rotation, shifting, flipping, zooming and shearing to an image,

image = tl.vis.read_image('tiger.jpeg')

xx = tl.prepro.rotation(image, rg=-20, is_random=False)
xx = tl.prepro.flip_axis(xx, axis=1, is_random=False)
xx = tl.prepro.shear2(xx, shear=(0., -0.2), is_random=False)
xx = tl.prepro.zoom(xx, zoom_range=0.8)
xx = tl.prepro.shift(xx, wrg=-0.1, hrg=0, is_random=False)

tl.vis.save_image(xx, '_result_slow.png')

However, by leveraging affine transformation, image operations can be combined into one:

# 1. Create required affine transformation matrices
M_rotate = tl.prepro.affine_rotation_matrix(angle=20)
M_flip = tl.prepro.affine_horizontal_flip_matrix(prob=1)
M_shift = tl.prepro.affine_shift_matrix(wrg=0.1, hrg=0, h=h, w=w)
M_shear = tl.prepro.affine_shear_matrix(x_shear=0.2, y_shear=0)
M_zoom = tl.prepro.affine_zoom_matrix(zoom_range=0.8)

# 2. Combine matrices
# NOTE: operations are applied in a reversed order (i.e., rotation is performed first)
M_combined = M_shift.dot(M_zoom).dot(M_shear).dot(M_flip).dot(M_rotate)

# 3. Convert the matrix from Cartesian coordinates (the origin in the middle of image)
# to image coordinates (the origin on the top-left of image)
transform_matrix = tl.prepro.transform_matrix_offset_center(M_combined, x=w, y=h)

# 4. Transform the image using a single operation
result = tl.prepro.affine_transform_cv2(image, transform_matrix)  # 76 times faster

tl.vis.save_image(result, '_result_fast.png')

The following figure illustrates the rational behind combined affine transformation.

../images/affine_transform_why.jpg

Using combined affine transformation has two key benefits. First, it allows you to leverage a pure Python API to achieve orders of magnitudes of speed up in image augmentation, and thus prevent data pre-processing from becoming a bottleneck in training. Second, performing sequential image transformation requires multiple image interpolations. This produces low-quality input images. In contrast, a combined transformation performs the interpolation only once, and thus preserve the content in an image. The following figure illustrates these two benefits:

../images/affine_transform_comparison.jpg

The major reason for combined affine transformation being fast is because it has lower computational complexity. Assume we have k affine transformations T1, ..., Tk, where Ti can be represented by 3x3 matrixes. The sequential transformation can be represented as y = Tk (... T1(x)), and the time complexity is O(k N) where N is the cost of applying one transformation to image x. N is linear to the size of x. For the combined transformation y = (Tk ... T1) (x) the time complexity is O(27(k - 1) + N) = max{O(27k), O(N)} = O(N) (assuming 27k << N) where 27 = 3^3 is the cost for combining two transformations.

Get rotation matrix

.. autofunction:: affine_rotation_matrix

Get horizontal flipping matrix

.. autofunction:: affine_horizontal_flip_matrix

Get vertical flipping matrix

.. autofunction:: affine_vertical_flip_matrix

Get shifting matrix

.. autofunction:: affine_shift_matrix

Get shearing matrix

.. autofunction:: affine_shear_matrix

Get zooming matrix

.. autofunction:: affine_zoom_matrix

Get respective zooming matrix

.. autofunction:: affine_respective_zoom_matrix

Cartesian to image coordinates

.. autofunction:: transform_matrix_offset_center

Apply image transform

.. autofunction:: affine_transform_cv2

Apply keypoint transform

.. autofunction:: affine_transform_keypoints


Images

Projective transform by points

.. autofunction:: projective_transform_by_points

Rotation

.. autofunction:: rotation
.. autofunction:: rotation_multi

Crop

.. autofunction:: crop
.. autofunction:: crop_multi

Flip

.. autofunction:: flip_axis
.. autofunction:: flip_axis_multi

Shift

.. autofunction:: shift
.. autofunction:: shift_multi

Shear

.. autofunction:: shear
.. autofunction:: shear_multi

Shear V2

.. autofunction:: shear2
.. autofunction:: shear_multi2

Swirl

.. autofunction:: swirl
.. autofunction:: swirl_multi

Elastic transform

.. autofunction:: elastic_transform
.. autofunction:: elastic_transform_multi

Zoom

.. autofunction:: zoom
.. autofunction:: zoom_multi

Respective Zoom

.. autofunction:: respective_zoom

Brightness

.. autofunction:: brightness
.. autofunction:: brightness_multi

Brightness, contrast and saturation

.. autofunction:: illumination

RGB to HSV

.. autofunction:: rgb_to_hsv

HSV to RGB

.. autofunction:: hsv_to_rgb

Adjust Hue

.. autofunction:: adjust_hue

Resize

.. autofunction:: imresize

Pixel value scale

.. autofunction:: pixel_value_scale

Normalization

.. autofunction:: samplewise_norm
.. autofunction:: featurewise_norm

Channel shift

.. autofunction:: channel_shift
.. autofunction:: channel_shift_multi

Noise

.. autofunction:: drop

Numpy and PIL

.. autofunction:: array_to_img

Find contours

.. autofunction:: find_contours

Points to Image

.. autofunction:: pt2map

Binary dilation

.. autofunction:: binary_dilation

Greyscale dilation

.. autofunction:: dilation

Binary erosion

.. autofunction:: binary_erosion

Greyscale erosion

.. autofunction:: erosion



Object detection

Tutorial for Image Aug

Hi, here is an example for image augmentation on VOC dataset.

import tensorlayer as tl

## download VOC 2012 dataset
imgs_file_list, _, _, _, classes, _, _,\
    _, objs_info_list, _ = tl.files.load_voc_dataset(dataset="2012")

## parse annotation and convert it into list format
ann_list = []
for info in objs_info_list:
    ann = tl.prepro.parse_darknet_ann_str_to_list(info)
    c, b = tl.prepro.parse_darknet_ann_list_to_cls_box(ann)
    ann_list.append([c, b])

# read and save one image
idx = 2  # you can select your own image
image = tl.vis.read_image(imgs_file_list[idx])
tl.vis.draw_boxes_and_labels_to_image(image, ann_list[idx][0],
     ann_list[idx][1], [], classes, True, save_name='_im_original.png')

# left right flip
im_flip, coords = tl.prepro.obj_box_horizontal_flip(image,
        ann_list[idx][1], is_rescale=True, is_center=True, is_random=False)
tl.vis.draw_boxes_and_labels_to_image(im_flip, ann_list[idx][0],
        coords, [], classes, True, save_name='_im_flip.png')

# resize
im_resize, coords = tl.prepro.obj_box_imresize(image,
        coords=ann_list[idx][1], size=[300, 200], is_rescale=True)
tl.vis.draw_boxes_and_labels_to_image(im_resize, ann_list[idx][0],
        coords, [], classes, True, save_name='_im_resize.png')

# crop
im_crop, clas, coords = tl.prepro.obj_box_crop(image, ann_list[idx][0],
         ann_list[idx][1], wrg=200, hrg=200,
         is_rescale=True, is_center=True, is_random=False)
tl.vis.draw_boxes_and_labels_to_image(im_crop, clas, coords, [],
         classes, True, save_name='_im_crop.png')

# shift
im_shfit, clas, coords = tl.prepro.obj_box_shift(image, ann_list[idx][0],
        ann_list[idx][1], wrg=0.1, hrg=0.1,
        is_rescale=True, is_center=True, is_random=False)
tl.vis.draw_boxes_and_labels_to_image(im_shfit, clas, coords, [],
        classes, True, save_name='_im_shift.png')

# zoom
im_zoom, clas, coords = tl.prepro.obj_box_zoom(image, ann_list[idx][0],
        ann_list[idx][1], zoom_range=(1.3, 0.7),
        is_rescale=True, is_center=True, is_random=False)
tl.vis.draw_boxes_and_labels_to_image(im_zoom, clas, coords, [],
        classes, True, save_name='_im_zoom.png')

In practice, you may want to use threading method to process a batch of images as follows.

import tensorlayer as tl
import random

batch_size = 64
im_size = [416, 416]
n_data = len(imgs_file_list)
jitter = 0.2
def _data_pre_aug_fn(data):
    im, ann = data
    clas, coords = ann
    ## change image brightness, contrast and saturation randomly
    im = tl.prepro.illumination(im, gamma=(0.5, 1.5),
             contrast=(0.5, 1.5), saturation=(0.5, 1.5), is_random=True)
    ## flip randomly
    im, coords = tl.prepro.obj_box_horizontal_flip(im, coords,
             is_rescale=True, is_center=True, is_random=True)
    ## randomly resize and crop image, it can have same effect as random zoom
    tmp0 = random.randint(1, int(im_size[0]*jitter))
    tmp1 = random.randint(1, int(im_size[1]*jitter))
    im, coords = tl.prepro.obj_box_imresize(im, coords,
            [im_size[0]+tmp0, im_size[1]+tmp1], is_rescale=True,
             interp='bicubic')
    im, clas, coords = tl.prepro.obj_box_crop(im, clas, coords,
             wrg=im_size[1], hrg=im_size[0], is_rescale=True,
             is_center=True, is_random=True)
    ## rescale value from [0, 255] to [-1, 1] (optional)
    im = im / 127.5 - 1
    return im, [clas, coords]

# randomly read a batch of image and the corresponding annotations
idexs = tl.utils.get_random_int(min=0, max=n_data-1, number=batch_size)
b_im_path = [imgs_file_list[i] for i in idexs]
b_images = tl.prepro.threading_data(b_im_path, fn=tl.vis.read_image)
b_ann = [ann_list[i] for i in idexs]

# threading process
data = tl.prepro.threading_data([_ for _ in zip(b_images, b_ann)],
              _data_pre_aug_fn)
b_images2 = [d[0] for d in data]
b_ann = [d[1] for d in data]

# save all images
for i in range(len(b_images)):
    tl.vis.draw_boxes_and_labels_to_image(b_images[i],
             ann_list[idexs[i]][0], ann_list[idexs[i]][1], [],
             classes, True, save_name='_bbox_vis_%d_original.png' % i)
    tl.vis.draw_boxes_and_labels_to_image((b_images2[i]+1)*127.5,
             b_ann[i][0], b_ann[i][1], [], classes, True,
             save_name='_bbox_vis_%d.png' % i)

Image Aug with TF Dataset API

  • Example code for VOC here.

Coordinate pixel unit to percentage

.. autofunction:: obj_box_coord_rescale

Coordinates pixel unit to percentage

.. autofunction:: obj_box_coords_rescale

Coordinate percentage to pixel unit

.. autofunction:: obj_box_coord_scale_to_pixelunit

Coordinate [x_center, x_center, w, h] to up-left button-right

.. autofunction:: obj_box_coord_centroid_to_upleft_butright

Coordinate up-left button-right to [x_center, x_center, w, h]

.. autofunction:: obj_box_coord_upleft_butright_to_centroid

Coordinate [x_center, x_center, w, h] to up-left-width-high

.. autofunction:: obj_box_coord_centroid_to_upleft

Coordinate up-left-width-high to [x_center, x_center, w, h]

.. autofunction:: obj_box_coord_upleft_to_centroid

Darknet format string to list

.. autofunction:: parse_darknet_ann_str_to_list

Darknet format split class and coordinate

.. autofunction:: parse_darknet_ann_list_to_cls_box

Image Aug - Flip

.. autofunction:: obj_box_horizontal_flip

Image Aug - Resize

.. autofunction:: obj_box_imresize

Image Aug - Crop

.. autofunction:: obj_box_crop

Image Aug - Shift

.. autofunction::  obj_box_shift

Image Aug - Zoom

.. autofunction:: obj_box_zoom

Image Aug - Affine

.. autofunction:: obj_box_coord_affine

Image Aug - Rotated-Affine

.. autofunction:: rotated_obj_box_coord_affine

Keypoints

Image Aug - Crop

.. autofunction:: keypoint_random_crop

Image Aug - Resize then Crop

.. autofunction:: keypoint_resize_random_crop

Image Aug - Rotate

.. autofunction:: keypoint_random_rotate

Image Aug - Flip

.. autofunction:: keypoint_random_flip

Image Aug - Resize

.. autofunction:: keypoint_random_resize

Image Aug - Resize Shortest Edge

.. autofunction:: keypoint_random_resize_shortestedge


Sequence

More related functions can be found in tensorlayer.nlp.

Padding

.. autofunction:: pad_sequences

Remove Padding

.. autofunction:: remove_pad_sequences


Process

.. autofunction:: process_sequences

Add Start ID

.. autofunction:: sequences_add_start_id


Add End ID

.. autofunction:: sequences_add_end_id

Add End ID after pad

.. autofunction:: sequences_add_end_id_after_pad

Get Mask

.. autofunction:: sequences_get_mask