You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Scientific software in constantly being challenged by enthusiasts trying to
20
+
Scientific software is constantly being challenged by enthusiasts trying to
21
21
test the boundaries of programming languages, in search of better performance
22
-
and simpler workflows. One such breakthrough was achieved using Cling, a C++
23
-
interpreter, that has presented new possibilities with an incremental
24
-
compilation infrastructure that is available at runtime.
22
+
and simpler workflows. Interactive C++ interpreters such as Cling and ClangRepl
23
+
presented new possibilities with an incremental compilation infrastructure that
24
+
is available at runtime.
25
25
26
26
This means that Python can interact with C++ on an on-demand basis, and
27
27
bindings can be automatically constructed at runtime. This provides
@@ -40,10 +40,10 @@ C++ gained early adoption in scientific research fields due to its high
40
40
performance capabilities. But the interactive nature of Python and the gentler
41
41
learning curve led to higher adoption rates elsewhere, and as a result, it saw
42
42
exponential advancements in capabilities that were ideal for data science
43
-
research. However, this does not discredit the usefulness of C++ in data
44
-
science. A lot of research infrastructure is still rooted in C++ and benefits
45
-
from some of its unique features (e.g., access to accelerators in
46
-
heterogeneous computing environments).
43
+
research. Python shines when steering infrastructure written in high
44
+
performance language such as C or C++. However, it is challenging to write the
45
+
glue layers between both languages for every package available in the C++
46
+
ecosystem.
47
47
48
48
This is where the usefulness of language interoperability becomes evident.
49
49
However, this requires an advanced integration solution, especially for high
@@ -56,8 +56,9 @@ this task (with some enhancements). Numba is capable of compiling Python code
56
56
while targeting either the CPU or the GPU, and providing interfaces to use the
57
57
JITed closures from low-level libraries. Numba helps lower Python to machine
58
58
code level and minimizes costly language crossings. In order to provide the
59
-
missing links for this research, Numba was also integrated with cppyy (an
60
-
automatic runtime bindings generator).
59
+
missing links for this research, Numba was also integrated with the cppyy
60
+
project -- an automatic runtime bindings generator based using interactive C++
61
+
to connect to the Python runtime.
61
62
62
63
The target of this research was to demonstrate a generic prototype that
63
64
automatically brings advanced C++ features (e.g., highly optimized numeric
@@ -93,20 +94,39 @@ either C++ or Python, as appropriate.
93
94
94
95
### Prototype Overview
95
96
96
-
The primary motivation behind the addition of Numba support in cppyy is the elimination of the overhead that arises from crossing the languiage barrier, which can multiply into large slowdowns when using loops with cppyy objects. Since Numba compiles Python code into machine code it only crosses the language barrier once and the loops thus run faster
97
+
The primary motivation behind the addition of Numba support in cppyy is the
98
+
elimination of the overhead that arises from crossing the language barrier,
99
+
which can multiply into large slowdowns when using loops with cppyy objects.
100
+
Since Numba compiles Python code into machine code, it only crosses the
101
+
language barrier once, and the loops thus run faster
Python is a dynamically typed language. It wraps and later unwraps objects (referred to as boxing/unboxing). These costly operations are eliminated with Numba, which unboxes the inputs of a function and converts it to machine code. This improves the performance of heavily looped code that perform certain operations. At the end, the output is boxed so that Python can use it. For this to work, Numba needs to infer the types of not only the input and output but the intermediate variables as well.
105
+
Python is a dynamically typed language. It wraps and later unwraps objects
106
+
(referred to as boxing/unboxing). These costly operations are eliminated with
107
+
Numba, which unboxes the inputs of a function and converts it to machine code.
108
+
This improves the performance of heavily looped code that perform certain
109
+
operations. At the end, the output is boxed so that Python can use it. For
110
+
this to work, Numba needs to infer the types of not only the input and output
111
+
but the intermediate variables as well.
101
112
102
-
To bring C++ to Numba, a custom module was developed on top of cppyy using the Numba low level extension API.
103
-
This enables Python programmers to selectively enable Numba acceleration for performance-critical tasks by importing `cppyy.numba_ext`
113
+
To bring C++ to Numba, a custom module was developed on top of cppyy using the
114
+
Numba low-level extension API. This enables Python programmers to selectively
115
+
enable Numba acceleration for performance-critical tasks by importing
The extension aids Numba's three phases which are- Typing, Lowering(to LLVM IR) and Boxing/Unboxing which process all (or most) C++ proxies held by the Python interpreter in the form of cppyy objects.
120
+
The extension aids Numba's three phases which are- Typing, Lowering(to LLVM
121
+
IR) and Boxing/Unboxing which process all (or most) C++ proxies held by the
122
+
Python interpreter in the form of cppyy objects.
108
123
109
-
The biggest challenge while integrating cppyy support in Numba is to teach Numba what cppyy types and data mean. We approach this by utilising an improved reflection API within cppyy (`__cpp_reflex__`). Reflex returns information about cppyy objects within the scope of the Numba accelerated function. This allows us to inherit Numba's typing classes and populate them with more information without which we cannot box/unbox and lower to LLVM IR.
124
+
The biggest challenge while integrating cppyy support in Numba is to teach
125
+
Numba what cppyy types and data mean. We approach this by utilising an
126
+
improved reflection API within cppyy (`__cpp_reflex__`). Reflex returns
127
+
information about cppyy objects within the scope of the Numba accelerated
128
+
function. This allows us to inherit Numba's typing classes and populate them
129
+
with more information without which we cannot box/unbox and lower to LLVM IR.
110
130
111
131
Let's look at the interaction between Cppyy, Numba and the Numba extension:
112
132
@@ -145,8 +165,11 @@ the case of templated free functions.
145
165
<br />
146
166
147
167
Where,
148
-
- Numba JIT time: The execution time of Numba JITed functions with cppyy objects against their Python counterparts (to obtain the time taken by Numba to JIT the function).
149
-
- cppyy JIT time: The time taken by cppyy to create the typing information and possibly to perform lookups and instantiate templates.
168
+
- Numba JIT time: The execution time of Numba JITed functions with cppyy
169
+
objects against their Python counterparts (to obtain the time taken by Numba
170
+
to JIT the function).
171
+
- cppyy JIT time: The time taken by cppyy to create the typing information and
172
+
possibly to perform lookups and instantiate templates.
150
173
- Hot run time: The time taken to execute the function after it has been JITed.
151
174
- Python run time: The time taken to execute the equivalent Python function.
152
175
- Speedup: Compares the Hot run time to Python run time.
@@ -165,7 +188,7 @@ This opens up several possibilities for developers. For example, they can
165
188
develop and debug their code in Python, while using C++ libraries, and
166
189
switching on the Numba JIT for selected performance-critical closures.
167
190
168
-
The results are promising, with 2-20 times speedup when using Numba to
191
+
The results are promising, with a 2 to 20 times speedup when using Numba to
169
192
accelerate cppyy through our extension. Further gains were demonstrated using
170
193
the Clang-Repl component of LLVM and the newly developed library CppInterOp.
171
194
Preliminary results show 1.4 to 144 times faster handling of templated code in
0 commit comments