Skip to content

Commit 48b60fb

Browse files
b1shtreamvgvassilev
authored andcommitted
Add the Enable-CUDA-Cppyy-Numba project introduction blogpost
1 parent 1042f64 commit 48b60fb

File tree

1 file changed

+76
-0
lines changed

1 file changed

+76
-0
lines changed
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
title: "Enable CUDA Compilation on Cppyy-Numba generated IR"
3+
layout: post
4+
excerpt: "Enable CUDA Compilation on Cppyy-Numba generated IR project, part of Google Summer of Code 2024, aims to demonstrate Cppyy's capability to provide CUDA paradigms to Python users without any compromise in performance."
5+
sitemap: false
6+
author: Riya Bisht
7+
permalink: blogs/gsoc24_riya_bisht_introduction_blog/
8+
date: 2024-05-30
9+
tags: gsoc cuda numba llvm cppyy
10+
---
11+
12+
### Introduction
13+
14+
I'm Riya Bisht, a third-year Computer Science & Engineering undergrad at Graphic Era University, India, and a Google Summer of Code 2024 contributor for a project related to Cppyy, an automatic, runtime Python-C++ binding generator developed by the Compiler Research Group. I am interested in low-level systems, particularly compilers, and runtimes, and also love to stay updated with science, CERN, and the mysteries of the universe. My hobbies include writing blogs and reading technoculture stuff on the web.
15+
16+
**Mentors**: Wim Lavrijsen, Aaron Jomy, Vassil Vassilev, Jonas Rembser
17+
18+
### Personal Motivation
19+
20+
I got introduced to this project while researching on my personal research project related to WebGPUs/WASM and low-level systems. I wanted to gain experience and technical skills in GPU programming and low-level systems mainly Compilers because this will help me in contributing to my research project and the wider Scientific Community in general. Based on my research and experience as a part of other open-source projects, I found that not many people are aware of compilers and low-level systems due to the strong belief that they are hard to understand so the goal for the future is to remove this barrier to entry and make low-level systems more accessible for beginners.
21+
22+
### Introduction to Cppyy and the problem statement
23+
24+
Cppyy is an automatic python-C++ runtime binding generator that helps to call C++ code from Python and vice-versa. This enables interoperability between different language ecosystems, avoids the cross-language overhead, and promotes heterogeneous computing. The initial support for Numba, a Python JIT Compiler has been added which compiles looped code containing C++ objects/methods/functions defined via Cppyy into fast machine code. This proposed project seeks to leverage Cppyy's integration with Numba, a high-performance Python compiler, to enable the compilation of CUDA C++ code defined via Cppyy into efficient machine code.
25+
26+
### Importance of this project
27+
28+
As we know, heterogeneous computing is the future. The scientific community is heavily relying on GPGPU(General-Purpose Graphics Processing Unit)computations, that incorporate CPUs as well as GPUs for running workloads based on their requirements. This architecture of GPGPUs generates a need for scientists to understand the low-level graphics APIs like CUDA(Compute Unified Device Architecture) which comes with a whole new learning curve, instead, we can use Python language which is more familiar to the scientific ecosystem. Cppyy can help provide efficient Python-CUDA C++ bindings during runtime. This enables scientists to leverage GPU acceleration in a much more user-friendly language with rich ecosystems like Python without compromising on performance. Based on research, python can be slow as compared to other performant systems programming languages like C++ so we will use Numba, a high-performance Python JIT compiler that will produce fast machine code out of our Python code.
29+
30+
### Implementation Approach and Plans
31+
32+
Milestones of this project include:
33+
1. **Implementing the support of parsing and declaration of CUDA code defined in Cppyy**: Cppyy is a Python frontend that utilizes the concept of `proxies` and `reflections`. C++ objects and functions are exposed to the Python side using proxies. In the context of cppyy, `reflex` allows Python code to inspect and interact with C++ classes and functions as if they were Python objects. C++ reflection consists of type information like return types, member offsets, aggregate types(classes, structs), and namespaces. As the result of my evaluation task for this project, the CUDA code defined using cppyy was able to provide the CUDA version and CUDA device properties, such as device name and memory clock rate, by utilizing CUDA APIs. This functionality was enabled by setting the environment variable `CLING_ENABLE_CUDA` to `1` which activates Cling's CUDA backend. However, based on recent findings, `cppyy-backend`(`cppyy-cling`) is not able to handle the CUDA Kernel invocation from `cppdef`. Hence, to add CUDA support, it is necessary to implement a cppyy helper function called `cudadef`, which is similar to `cppdef`. This is crucial for isolating CUDA code from C++ code in cppyy. Furthermore, this implementation will allow the backend to pull Cling only once with the CUDA headers enabled in the precompiled header (PCH). This approach can eliminate irrelevant errors that might occur when calling Cling twice: once with CUDA headers enabled for executing GPU kernels, and another time without CUDA headers in the PCH for executing CPU code.
34+
By separating the CUDA and C++ code execution paths, cppyy can provide a more stable and efficient environment for integrating CUDA functionality into Python.
35+
36+
2. **Designing and developing CUDA compilation pipeline**: At present, the CUDA compilation is supported by adding CUDA headers to PCH(Pre-compiled headers) during runtime but this provides control to Cling, which is an interactive C++ interpreter. We want to take control from Cling and provide it to Numba using numba decorators while it invokes GPU kernels from Cppyy. Numba uses the proxies to obtain function pointers and then runs the LLVM compilation passes using `llvmlite`. That's why the scope of the project is to utilize numba so we don’t have to deal with Cling. This can include adding:
37+
- Support of helpers in `numba_ext.py` to simplify the process of launching CUDA kernels directly from Python.
38+
- Support of CUDA-specific data types in `LLVM IR`.
39+
40+
The research is still ongoing for this part of the project.
41+
42+
3. **Testing and Documentation support**: Prepare comprehensive tests to ensure functionality and robustness. Create detailed documentation including debugging guides for users and developers.
43+
44+
4. **Future scope**: To provide further optimization techniques for extracting more performance out of GPUs
45+
46+
Upon successful completion, a possible proof-of-concept can be expected in the below code snippet:
47+
48+
```python
49+
import cppyy
50+
import cppyy.numba_ext
51+
52+
cppyy.cudadef('''
53+
__global__ void MatrixMul(float* A, float* B, float* out) {
54+
// kernel logic for matrix multiplication
55+
}
56+
''')
57+
58+
@numba.njit
59+
def run_cuda_mul(A, B, out):
60+
# Allocate memory for input and output arrays on GPU
61+
# Define grid and block dimensions
62+
# Launch the kernel
63+
MatrixMul[griddim, blockdim](d_A, d_B, d_out)
64+
```
65+
This would allow Python users to utilize CUDA for parallel computing on GPUs while maintaining high performance via Numba, as demonstrated by the above-provided code snippet involving matrix multiplication.
66+
67+
### Conclusion
68+
69+
The impact of this project extends far beyond Cppyy itself, as it empowers the scientific community by providing Python users with direct access to the performance and capabilities of C++ libraries. The CUDA support in the Python ecosystem through Cppyy and Numba can help accelerate the research and development in Scientific Computing domains like Data analysis(ROOT), Machine Learning, and computational sciences like simulating genetic code, protein structures, etc that rely on both languages.
70+
71+
### Related Links
72+
73+
- [Cppyy Repository](https://github.com/wlav/cppyy)
74+
- [Project Description](https://hepsoftwarefoundation.org/gsoc/2024/proposal_Cppyy-Numba-CUDA.html)
75+
- [GSoC Project Proposal](/assets/docs/Riya_Bisht_GSoC2024_Proposal.pdf)
76+
- [My GitHub Profile](https://github.com/chococandy63)

0 commit comments

Comments
 (0)