You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository contains a framework with a GPU implementation of generalized convolution operators. The framework is designed for large image data sets and can run in a distributed system.
A fast distributed GPU-based convolution algorithm using CUDA, MPI and pthreads.
7
+
</p>
8
+
</p>
9
+
10
+
## About The Project
11
+
12
+
Common image processing operators such as Gaussian blurring, certain edge detectors, dilations and erosions can all be expressed as convolutions. Performing convolutions on large image data sets takes a significant amount of time. To improve the performance of these operators, parallelization strategies can be employed. We propose GenConv: a framework that can run in a distributed setup and makes use of CUDA to perform convolution operators on the GPU. It provides the ability to do convolutions, dilations and erosions. The programmer can chain and customize these operations in any way they see fit.
13
+
14
+
## Getting Started
15
+
16
+
To get a local copy up and running follow these simple steps.
17
+
18
+
### Prerequisites
19
+
20
+
You need to the following to be able to compile and run the project
Will create a release build of the program. The default is a debug build.
46
+
```sh
47
+
make KERNEL_SCRIPT=processingScripts/processing.cu
48
+
```
49
+
This will set the processing script to use to be `src/cuda/processingScripts/processing.cu`. This is useful for when multiple scripts are present and you want to switch between them on consecutive runs. Note that only one processing script can be used at the time.
50
+
51
+
All of these optional arguments can be used at the same time.
52
+
53
+
### Running
54
+
55
+
#### SLURM
56
+
57
+
To run the program on a slurm cluster you can look at one of the `benchmark.sh` scripts for inspiration. Provided the configuration is correct, you can use:
58
+
59
+
```sh
60
+
srun ./conv job.txt outputdir
61
+
```
62
+
63
+
#### Single machine
64
+
65
+
You can run the project on a single machine as follows
66
+
67
+
```sh
68
+
mpirun -np 1 ./conv job.txt outputDir
69
+
```
70
+
71
+
Alternatively it can also be run without MPI:
72
+
73
+
```sh
74
+
./conv job.txt outputDir
75
+
```
76
+
77
+
# Job files
78
+
79
+
GenConv uses a custom format job file that contains some basic information about the type of images that is received and which images to process. The file follows the following format:
80
+
```
81
+
3
82
+
8
83
+
256 256
84
+
0 0
85
+
inputImages/image1.pgm
86
+
inputImages/image2.pgm
87
+
inputImages/image3.pgm
88
+
```
89
+
The first line indicates the number of images to process. The second line states the number of bits used for each pixel (the dynamic range). The third line indicates the maximum dimensions any given image in the job can have. The fourth line indicates how many pixels are padded on the side of each dimension. Next are all the images that should be processed.
90
+
91
+
As of now, only `.pgm` images are supported. The implementation was done in such a way that the addition of additional image formats is very straightforward.
92
+
93
+
# Kernels
94
+
95
+
The application supports very simple convolution kernel formats. These follow the following format:
96
+
```
97
+
3 3
98
+
0 1 0
99
+
1 -4 1
100
+
0 1 0
101
+
```
102
+
The first line indicates the `width`x`height` of the kernel. Next are `height` lines with on each line `width` elements of the kernel.
103
+
104
+
# Making changes to the Image Processing
105
+
106
+
The processing steps the programming does is defined in `src/cuda/processingScripts/processing.cu`. You can either alter this file or add a new file and pass this file as a make argument.
107
+
108
+
Image processing is often a matter of connecting small lego blocks in any way that your use case sees fit. This is impossible to do via a configuration system without significant performance penalties. As such, it is up to the programmer to define the sequence of operations they want to do. A very basic understanding of CUDA is required to achieve optimal performance here.
109
+
110
+
When making changes to the script, the `cudaConfig.h` in `src/configs` should be updated accordingly. In particular the maximum kernel dimensions. CUDA needs to know this, because constant memory cannot be dynamically allocated at runtime.
111
+
112
+
Ideally, the only two places the programmer ever needs to change things is in their processing script and a slight update to the `cudaConfig.h` to accomodate for any kernels they might use.
113
+
114
+
# Splitter & Combiner
115
+
116
+
The application also compiles to additional executables: `splitter` and `combiner`. The `splitter` can be used to either split a single image into multiple smaller tiles (optionally with padding). The `combiner` can be used to combine tiles into a single image again. Note that the combiner requires the input tiles to follow the same naming convention as the tiles generated by the splitted.
0 commit comments