In this chapter, we'll create a basic CUDA program that performs vector addition using the GPU. The program consists of a CUDA kernel that adds corresponding elements from two input vectors and stores the result in an output vector.
Let's go through the code step by step:
-
Kernel Function (
vectorAdd
):- The
vectorAdd
function is the heart of our CUDA program. It runs on the GPU and performs the vector addition. - It takes four arguments:
A
: Pointer to the first input vector.B
: Pointer to the second input vector.C
: Pointer to the output vector (where the result will be stored).size
: Size of the vectors (number of elements).
- Inside the kernel, each thread computes the sum of corresponding elements from
A
andB
and stores the result inC
.
- The
-
Main Function:
- The
main
function sets up the host (CPU) and device (GPU) memory, initializes input vectors, launches the kernel, and retrieves the result. - Key steps in the
main
function:- Allocate memory for vectors (
h_A
,h_B
, andh_C
) on the host. - Initialize input vectors (
h_A
andh_B
) with sample values. - Allocate memory for vectors (
d_A
,d_B
, andd_C
) on the device (GPU). - Copy data from host to device using
cudaMemcpy
. - Launch the
vectorAdd
kernel with appropriate block and grid dimensions. - Copy the result back from the device to the host.
- Print the result (output vector
h_C
).
- Allocate memory for vectors (
- The
-
Memory Allocation and Transfer:
- We allocate memory for vectors on both the host and the device.
cudaMalloc
allocates memory on the device.cudaMemcpy
transfers data between host and device.
-
Kernel Launch:
- We launch the
vectorAdd
kernel using<<<numBlocks, threadsPerBlock>>>
syntax. numBlocks
andthreadsPerBlock
determine the grid and block dimensions.
- We launch the
-
Clean Up:
- We free the allocated device memory using
cudaFree
. - We also delete the host vectors (
h_A
,h_B
, andh_C
) to avoid memory leaks.
- We free the allocated device memory using
-
Compile the Code:
- Open your terminal or command prompt.
- Navigate to the folder containing
vector_addition.cu
. - Compile the code using
nvcc
(NVIDIA CUDA Compiler):nvcc vector_addition.cu -o vector_addition
-
Run the Executable:
- Execute the compiled binary:
./vector_addition
- You'll see the result of vector addition printed to the console.
- Execute the compiled binary: