Skip to content

Commit 934ecf6

Browse files
committed
Changes requested by David and Aaron
- Added the Code example suggested by Aaron - Focused on Achievements of the research - Removed verbose sections about C++/Python - Revamped the document structure with clear headings, for better readability
1 parent 56b9a7e commit 934ecf6

File tree

1 file changed

+152
-102
lines changed

1 file changed

+152
-102
lines changed

_posts/2023-04-05-language-interoperability-using-cppyy-and-cling.md

Lines changed: 152 additions & 102 deletions
Original file line numberDiff line numberDiff line change
@@ -17,88 +17,69 @@ date: 2023-04-05
1717
margin: 0 auto;
1818
{% endcapture %}
1919

20-
Scientific software is constantly being challenged by enthusiasts trying to
21-
test the boundaries of programming languages, in search of better performance
22-
and simpler workflows. Interactive C++ interpreters such as Cling and ClangRepl
23-
presented new possibilities with an incremental compilation infrastructure that
24-
is available at runtime.
25-
26-
This means that Python can interact with C++ on an on-demand basis, and
27-
bindings can be automatically constructed at runtime. This provides
28-
unprecedented performance and does not require direct support from library
29-
authors.
30-
31-
The Compiler Research team presented these findings in the paper: [Efficient
32-
and Accurate Automatic Python Bindings with cppyy & Cling]. It presents the
33-
enhancements in language interoperability using Cling with cppyy (an
34-
automatic, run-time, Python-C++ bindings generator). Following is a high-level
35-
summary of these findings.
36-
37-
38-
### Background
39-
C++ gained early adoption in scientific research fields due to its high
40-
performance capabilities. But the interactive nature of Python and the gentler
41-
learning curve led to higher adoption rates elsewhere, and as a result, it saw
42-
exponential advancements in capabilities that were ideal for data science
43-
research. Python shines when steering infrastructure written in high
44-
performance language such as C or C++. However, it is challenging to write the
45-
glue layers between both languages for every package available in the C++
46-
ecosystem.
47-
48-
This is where the usefulness of language interoperability becomes evident.
49-
However, this requires an advanced integration solution, especially for high
50-
performance code that can suffer from penalties crossing the language barrier.
20+
### Introduction
21+
22+
Scientific software development continually seeks to balance the high
23+
performance of languages like C++ with the user-friendly nature of Python.
24+
This article summarizes the advancements in language interoperability
25+
presented in the paper [Efficient and Accurate Automatic Python Bindings with
26+
cppyy & Cling].
27+
28+
This research focuses on enhancing language interoperability with cppyy,
29+
enabling uniform cross-language execution environments. It illustrates the use
30+
of advanced C++ in Numba-accelerated Python through cppyy. This required
31+
re-engineering parts of cppyy to use upstream LLVM components. cppyy was
32+
further empowered with a C++ reflection library, InterOp, which offers
33+
interoperability primitives based on Cling and Clang-Repl (C++ Interpreters).
34+
35+
### Key Components
36+
37+
38+
#### 1. Cling: Interactive C++ Interpreter
39+
40+
Cling, an interactive C++ interpreter based on Clang/LLVM, provides the
41+
foundation for cppyy's ability to interact with C++ code dynamically and
42+
efficiently.
43+
44+
#### 2. cppyy: An automatic Runtime Bindings Generator
45+
46+
cppyy is a tool that automatically generates Python bindings for C++ code at
47+
runtime. It allows Python to interact with C++ on an on-demand basis.
48+
49+
#### 3. Numba: a just-in-time (JIT) compiler for Python
50+
51+
Numba is capable of compiling Python code while targeting either the CPU or
52+
the GPU, and providing interfaces to use the JITed closures from low-level
53+
libraries.
5154

5255
![numba extension](/images/blog/cppyy-numba-1.png){: style="{{ image_style }}"}
5356

54-
Numba, a just-in-time (JIT) compiler for Python, is a tool that is ideal for
55-
this task (with some enhancements). Numba is capable of compiling Python code
56-
while targeting either the CPU or the GPU, and providing interfaces to use the
57-
JITed closures from low-level libraries. Numba helps lower Python to machine
58-
code level and minimizes costly language crossings. In order to provide the
59-
missing links for this research, Numba was also integrated with the cppyy
60-
project -- an automatic runtime bindings generator based using interactive C++
61-
to connect to the Python runtime.
62-
63-
The target of this research was to demonstrate a generic prototype that
64-
automatically brings advanced C++ features (e.g., highly optimized numeric
65-
libraries) to Numba-accelerated Python, with help from cppyy. This required
66-
re-engineering of the cppyy-backend to directly use LLVM components. A new
67-
CppInterOp library was also introduced to implement interoperability
68-
primitives based on Cling and Clang-Repl (also an interactive interpreter, a
69-
progression on Cling).
70-
71-
### Merits of using Python
72-
73-
Rather than writing all performance-critical code in a lower-level language
74-
(e.g., C), and then interpret it back to Python (using extensions), we wanted
75-
to lower the Python code itself to native level using JIT. This would enable
76-
the developer to stay in Python and write and debug the code in a single
77-
environment. We also needed this JIT code to work well with bound C++ code.
78-
Therefore, we used Numba as a Python JIT and integrated it with C++ using
79-
cppyy.
80-
81-
Interestingly, this approach makes it easy to use Python kernels in C++,
82-
without losing performance, enabling continued use of an existing C++
83-
codebase.
84-
85-
### Merits of using C++
86-
87-
C++ is evolving rapidly, enabling automation and a more expressive approach
88-
for better code quality and compiler optimization. Consecutively, cppyy (which
89-
is based on Cling, a C++ interpreter based on Clang/LLVM) helps bring better
90-
interactivity and runtime experiences to C++, and is able to evolve
91-
side-by-side, thanks to its roots in LLVM infrastructure. Together, these
92-
tools help address even the previously unresolved corner cases at runtime in
93-
either C++ or Python, as appropriate.
57+
Numba helps lower Python to machine code level and minimizes costly
58+
language crossings. In order to provide the missing links for this research,
59+
Numba was also integrated with the cppyy project to connect to the Python
60+
runtime.
61+
62+
63+
#### 4. cppyy Integration with Numba
64+
65+
The research demonstrates how cppyy can be integrated with Numba, a
66+
just-in-time (JIT) compiler for Python. This integration aims to eliminate the
67+
overhead of crossing the language barrier, particularly in loop-heavy code.
68+
69+
#### 5. CppInterOp: A New Interoperability Library
70+
71+
A new library, CppInterOp, was introduced to implement interoperability
72+
primitives based on Cling and Clang-REPL. This library enhances the reflection
73+
capabilities necessary for efficient language interoperability.
74+
9475

9576
### Prototype Overview
9677

9778
The primary motivation behind the addition of Numba support in cppyy is the
9879
elimination of the overhead that arises from crossing the language barrier,
9980
which can multiply into large slowdowns when using loops with cppyy objects.
10081
Since Numba compiles Python code into machine code, it only crosses the
101-
language barrier once, and the loops thus run faster
82+
language barrier once, and the loops thus run faster.
10283

10384
![numba extension](/images/blog/cppyy-numba-2.png){: style="{{ image_style }}"}
10485

@@ -110,41 +91,111 @@ operations. At the end, the output is boxed so that Python can use it. For
11091
this to work, Numba needs to infer the types of not only the input and output
11192
but the intermediate variables as well.
11293

113-
To bring C++ to Numba, a custom module was developed on top of cppyy using the
114-
Numba low-level extension API. This enables Python programmers to selectively
115-
enable Numba acceleration for performance-critical tasks by importing
116-
`cppyy.numba_ext`
94+
#### Numba Extension for cppyy
95+
96+
A custom module was developed on top of cppyy using Numba's low-level
97+
extension API. This enables Python programmers to selectively enable Numba
98+
acceleration for performance-critical tasks involving C++ code (by importing
99+
`cppyy.numba_ext`).
117100

118101
![numba extension](/images/blog/cppyy-numba-3.png){: style="{{ image_style }}"}
119102

120-
The extension aids Numba's three phases which are- Typing, Lowering(to LLVM
121-
IR) and Boxing/Unboxing which process all (or most) C++ proxies held by the
103+
The extension aids Numba's three phases, which are: Typing, Lowering(to LLVM
104+
IR), and Boxing/Unboxing; which process all (or most) C++ proxies held by the
122105
Python interpreter in the form of cppyy objects.
123106

124-
The biggest challenge while integrating cppyy support in Numba is to teach
125-
Numba what cppyy types and data mean. We approach this by utilising an
126-
improved reflection API within cppyy (`__cpp_reflex__`). Reflex returns
127-
information about cppyy objects within the scope of the Numba accelerated
128-
function. This allows us to inherit Numba's typing classes and populate them
129-
with more information without which we cannot box/unbox and lower to LLVM IR.
107+
#### Enhanced Reflection API
130108

131-
Let's look at the interaction between Cppyy, Numba and the Numba extension:
109+
This research included creation of an improved reflection API
110+
(`__cpp_reflex__`) within cppyy. The Reflection API provides detailed
111+
information about cppyy objects (within the scope of the Numba-accelerated
112+
function). This allows inheritance of Numba's type handling classes and
113+
populating them with more information required to box/unbox and lower the code
114+
to LLVM IR.
132115

133-
![numba extension](/images/blog/cppyy-numba-4.png){: style="{{ image_style }}"}
116+
#### cppyy Backend Re-Engineering
134117

135-
Numba analyzes a Python code and when it encounters cppyy types, it queries
136-
the cppyy’s pre-registered `numba_ext` module for the type information. If
137-
`numba_ext` encounters a type that it hasn't seen before, it queries cppyy’s
138-
new reflection API. This helps generate the necessary typing classes and
139-
lowering methods. Each core language construct (namespaces, classes, free
140-
functions, methods, data members, etc.) has its own implementation. This
141-
process provides Numba with the information needed to convert the function
142-
call to LLVM IR.
118+
The cppyy backend was re-engineered to directly use LLVM components. This
119+
modification aims to improve performance and sustainability, leveraging the
120+
advanced capabilities of the LLVM infrastructure.
143121

144-
### Benchmarks
122+
#### Interaction between Cppyy, Numba and the Numba extension
123+
124+
Numba analyzes Python code and when it encounters cppyy types, it queries
125+
cppyy’s pre-registered `numba_ext` module for the type information.
126+
127+
![numba extension](/images/blog/cppyy-numba-4.png){: style="{{ image_style }}"}
145128

146-
The following benchmarks were executed on a 3.1GHz Intel NUC Core i7-8809G CPU
147-
with 32G RAM.
129+
If `numba_ext` encounters a type that it hasn't seen before, it queries
130+
cppyy’s new reflection API. This helps generate the necessary type handling
131+
classes and lowering methods.
132+
133+
Each core language construct (namespaces, classes, free functions, methods,
134+
data members, etc.) has its own implementation. This process provides Numba
135+
with the information needed to convert the function call to LLVM IR.
136+
137+
### Results
138+
139+
The following benchmark tests the running time of Numba-JITed functions with
140+
cppyy objects against their Python counterparts to obtain:
141+
142+
- the time taken by Numba to JIT the function (Numba JIT time),
143+
- the time taken by cppyy to create the typing info and possibly perform
144+
lookups and instantiate templates (cppyy JIT time),
145+
- the time taken to run the function after it has been JITed (Hot run time),
146+
and
147+
- the time taken to run the equivalent Python function.
148+
149+
> Note: The results are obtained on a 3.1GHz Intel NUC Core i7-8809G CPU with
150+
> 32G RAM.
151+
152+
#### Benchmark
153+
154+
The following fixture for ‘Templated free functions’ case was used to evaluate
155+
the speedup obtained by Numba JITing of cppyy objects. The other cases uses a
156+
similar setup.
157+
158+
**C++ code in cppyy**
159+
160+
Using a C++ templated function, as declared in cppyy:
161+
162+
```c++
163+
cppyy.cppdef(r"""
164+
template<class T>
165+
T add42(T t) {
166+
return T(t+42);
167+
}
168+
""")
169+
```
170+
171+
**Python/C++ binding with cppyy**
172+
173+
Using a Python kernel to run this C++ function:
174+
175+
```c++
176+
def go_slow(a):
177+
trace = 0.0
178+
for i in range(a.shape[0]):
179+
trace +=
180+
cppyy.gbl.add42(a[i, i]) +
181+
cppyy.gbl.add42(int(a[i, i]))
182+
return a + trace
183+
```
184+
185+
**Numba-acceleration of cppyy**
186+
187+
Using the same kernel but adding the Numba JIT decorator to accelerate it:
188+
189+
```c++
190+
@numba.jit(nopython=True)
191+
def go_fast(a):
192+
trace = 0.0
193+
for i in range(a.shape[0]):
194+
trace +=
195+
cppyy.gbl.add42(a[i, i]) +
196+
cppyy.gbl.add42(int(a[i, i]))
197+
return a + trace
198+
```
148199

149200
For each benchmark case in the following table, a Numpy array of size 100 ×
150201
100 was passed to the function. The times indicated in the table are averages
@@ -178,11 +229,10 @@ For more technical details, please view the paper: [Efficient and Accurate Autom
178229

179230
### Summary
180231

181-
In this research, we presented a new reflection interface developed for Numba
182-
and cppyy (an automatic runtime bindings generator based on Cling), in order
183-
to facilitate integration with C++. This also required enhancements to cppyy
184-
to provide a fully automatic and transparent process for integration, without
185-
loss in performance.
232+
The advancements presented in this research, particularly the changes to
233+
cppyy and its integration with Cling and Numba, represent a significant step
234+
forward in creating more seamless and efficient multilingual programming
235+
environments.
186236

187237
This opens up several possibilities for developers. For example, they can
188238
develop and debug their code in Python, while using C++ libraries, and

0 commit comments

Comments
 (0)