@@ -40,36 +40,60 @@ C++ gained early adoption in scientific research fields due to its high
40
40
performance capabilities. But the interactive nature of Python and the gentler
41
41
learning curve led to higher adoption rates elsewhere, and as a result, it saw
42
42
exponential advancements in capabilities that were ideal for data science
43
- research. A lot of research infrastructure is still rooted in C++ and benefits
43
+ research. However, this does not discredit the usefulness of C++ in data
44
+ science. A lot of research infrastructure is still rooted in C++ and benefits
44
45
from some of its unique features (e.g., access to accelerators in
45
- heterogeneous computing environments), making it impossible, or at least
46
- undesirable to abandon it for the exciting new features that the world of
47
- Python has to offer.
46
+ heterogeneous computing environments).
48
47
49
48
This is where the usefulness of language interoperability becomes evident.
50
49
However, this requires an advanced integration solution, especially for high
51
50
performance code that is executed in diverse environments.
52
51
53
- This is where Numba, a just-in-time (JIT) compiler for Python comes in. It is
54
- capable of compiling Python code, while targeting either the CPU or the GPU,
55
- and providing interfaces to use the JITed closures from low-level libraries.
56
- Numba helps lower Python to machine code level and minimizes costly language
57
- crossings. However, Numba has some limitations in this context that are
58
- remediated with cppyy (an automatic runtime bindings generator).
52
+ Numba, a just-in-time (JIT) compiler for Python, is a tool that is ideal for
53
+ this task (with some enhancements). Numba is capable of compiling Python code
54
+ while targeting either the CPU or the GPU, and providing interfaces to use the
55
+ JITed closures from low-level libraries. Numba helps lower Python to machine
56
+ code level and minimizes costly language crossings. In order to provide the
57
+ missing links for this research, Numba was also integrated with cppyy (an
58
+ automatic runtime bindings generator).
59
59
60
- The target of this research is to demonstrate a generic prototype that
60
+ The target of this research was to demonstrate a generic prototype that
61
61
automatically brings advanced C++ features (e.g., highly optimized numeric
62
62
libraries) to Numba-accelerated Python, with help from cppyy. This required
63
63
re-engineering of the cppyy back-end to directly use LLVM components. A new
64
64
CppInterOp library was also introduced to implement interoperability
65
65
primitives based on Cling and Clang-Repl (also an interactive interpreter, a
66
66
progression on Cling).
67
67
68
+ ### Merits of using Python
69
+
70
+ Rather than writing all performance-critical code in a lower-level language
71
+ (e.g., C), and then interpret it back to Python (using extensions), we wanted
72
+ to lower the Python code itself to native level using JIT. This would enable
73
+ the developer to stay in Python and write and debug the code in a single
74
+ environment. We also needed this JIT code to work well with bound C++ code.
75
+ Therefore, we used Numba as a Python JIT and integrated it with C++ using
76
+ cppyy.
77
+
78
+ Interestingly, this approach makes it easy to use Python kernels in C++,
79
+ without losing performance, enabling continued use of an existing C++
80
+ codebase.
81
+
82
+ ### Merits of using C++
83
+
84
+ C++ is evolving rapidly, enabling automation and a more expressive approach
85
+ for better code quality and compiler optimization. Consecutively, cppyy (which
86
+ is based on Cling, a C++ interpreter based on Clang/LLVM) helps bring better
87
+ interactivity and runtime experiences to C++, and is able to evolve
88
+ side-by-side, thanks to its roots in LLVM infrastructure. Together, these
89
+ tools help address even the previously unresolved corner cases at runtime in
90
+ either C++ or Python, as appropriate.
91
+
68
92
### Prototype Overview
69
93
70
94
To bring C++ to Numba, a reflection interface was developed on top of cppyy.
71
95
This enables Python programmers to develop and debug their code in Python and
72
- only switching on the Numba JIT for selected performance-critical tasks.
96
+ selectively switching on the Numba JIT for performance-critical tasks.
73
97
74
98
Python is a dynamically typed language. It wraps and later unwraps objects
75
99
(referred to as boxing/unboxing). This is a costly operation that can be
@@ -94,7 +118,13 @@ call to LLVM IR.
94
118
### Benchmarks
95
119
96
120
The following benchmarks were executed on a 3.1GHz Intel NUC Core i7-8809G CPU
97
- with 32G RAM.
121
+ with 32G RAM.
122
+
123
+ For each benchmark case in the following table, a Numpy array of size 100 ×
124
+ 100 was passed to the function. The times indicated in the table are averages
125
+ of 3000 runs. The Numba JITed functions achieve a minimum speedup of ** 2.3
126
+ times** in the case of methods and a maximum speedup of nearly ** 21 times** in
127
+ the case of templated free functions.
98
128
99
129
<br />
100
130
@@ -117,6 +147,22 @@ Where,
117
147
118
148
For more technical details, please view the paper: [ Efficient and Accurate Automatic Python Bindings with cppyy & Cling]
119
149
150
+ ### Summary
151
+
152
+ In this research, we presented a new reflection interface developed for Numba
153
+ and cppyy (an automatic runtime bindings generator based on Cling), in order
154
+ to facilitate integration with C++. This also required enhancements to cppyy
155
+ to provide a fully automatic and transparent process for integration, without
156
+ loss in performance.
157
+
158
+ This opens up several possibilities for developers. For example, they can
159
+ develop and debug their code in Python, while using C++ libraries, and
160
+ switching on the Numba JIT for selected performance-critical closures.
120
161
162
+ The results are promising, with 2-20 times speedup when using Numba to
163
+ accelerate cppyy through our extension. Further gains were demonstrated using
164
+ the Clang-Repl component of LLVM and the newly developed library CppInterOp.
165
+ Preliminary results show 1.4 to 144 times faster handling of templated code in
166
+ cppyy, which will indirectly improve the Numba-accelerated Python as well.
121
167
122
168
[ Efficient and Accurate Automatic Python Bindings with cppyy & Cling ] : https://arxiv.org/abs/2304.02712
0 commit comments