|
| 1 | +--- |
| 2 | +title: "The Next Step in Language Interoperability using Cling and cppyy" |
| 3 | +layout: post |
| 4 | +excerpt: "Scientific software in constantly being challenged by enthusiasts trying to |
| 5 | +test the boundaries of programming languages, in search of better performance |
| 6 | +and simpler workflows. One such breakthrough was achieved using Cling, a C++ |
| 7 | +interpreter, that has presented new possibilities with an incremental |
| 8 | +compilation infrastructure that is available at runtime." |
| 9 | +sitemap: false |
| 10 | +permalink: blogs/language_interoperability_with_cling_and_cppyy/ |
| 11 | +date: 2023-04-05 |
| 12 | +--- |
| 13 | + |
| 14 | +{% capture image_style %} |
| 15 | + max-width: 80%; |
| 16 | + display: block; |
| 17 | + margin: 0 auto; |
| 18 | +{% endcapture %} |
| 19 | + |
| 20 | +Scientific software in constantly being challenged by enthusiasts trying to |
| 21 | +test the boundaries of programming languages, in search of better performance |
| 22 | +and simpler workflows. One such breakthrough was achieved using Cling, a C++ |
| 23 | +interpreter, that has presented new possibilities with an incremental |
| 24 | +compilation infrastructure that is available at runtime. |
| 25 | + |
| 26 | +This means that Python can interact with C++ on an on-demand basis, and |
| 27 | +bindings can be automatically constructed at runtime. This provides |
| 28 | +unprecedented performance and does not require direct support from library |
| 29 | +authors. |
| 30 | + |
| 31 | +The Compiler Research team presented these findings in the paper: [Efficient |
| 32 | +and Accurate Automatic Python Bindings with cppyy & Cling]. It presents the |
| 33 | +enhancements in language interoperability using Cling with cppyy (an |
| 34 | +automatic, run-time, Python-C++ bindings generator). Following is a high-level |
| 35 | +summary of these findings. |
| 36 | + |
| 37 | + |
| 38 | +### Background |
| 39 | +C++ gained early adoption in scientific research fields due to its high |
| 40 | +performance capabilities. But the interactive nature of Python and the gentler |
| 41 | +learning curve led to higher adoption rates elsewhere, and as a result, it saw |
| 42 | +exponential advancements in capabilities that were ideal for data science |
| 43 | +research. A lot of research infrastructure is still rooted in C++ and benefits |
| 44 | +from some of its unique features (e.g., access to accelerators in |
| 45 | +heterogeneous computing environments), making it impossible, or at least |
| 46 | +undesirable to abandon it for the exciting new features that the world of |
| 47 | +Python has to offer. |
| 48 | + |
| 49 | +This is where the usefulness of language interoperability becomes evident. |
| 50 | +However, this requires an advanced integration solution, especially for high |
| 51 | +performance code that is executed in diverse environments. |
| 52 | + |
| 53 | +This is where Numba, a just-in-time (JIT) compiler for Python comes in. It is |
| 54 | +capable of compiling Python code, while targeting either the CPU or the GPU, |
| 55 | +and providing interfaces to use the JITed closures from low-level libraries. |
| 56 | +Numba helps lower Python to machine code level and minimizes costly language |
| 57 | +crossings. However, Numba has some limitations in this context that are |
| 58 | +remediated with cppyy (an automatic runtime bindings generator). |
| 59 | + |
| 60 | +The target of this research is to demonstrate a generic prototype that |
| 61 | +automatically brings advanced C++ features (e.g., highly optimized numeric |
| 62 | +libraries) to Numba-accelerated Python, with help from cppyy. This required |
| 63 | +re-engineering of the cppyy back-end to directly use LLVM components. A new |
| 64 | +CppInterOp library was also introduced to implement interoperability |
| 65 | +primitives based on Cling and Clang-Repl (also an interactive interpreter, a |
| 66 | +progression on Cling). |
| 67 | + |
| 68 | +### Prototype Overview |
| 69 | + |
| 70 | +To bring C++ to Numba, a reflection interface was developed on top of cppyy. |
| 71 | +This enables Python programmers to develop and debug their code in Python and |
| 72 | +only switching on the Numba JIT for selected performance-critical tasks. |
| 73 | + |
| 74 | +Python is a dynamically typed language. It wraps and later unwraps objects |
| 75 | +(referred to as boxing/unboxing). This is a costly operation that can be |
| 76 | +eliminated with Numba, while using the new Reflection API. The Reflection API |
| 77 | +uses a function called `__cpp_reflex__` that takes the reflection type and |
| 78 | +format as parameters and returns the requested information (e.g., an object’s |
| 79 | +C++ type). |
| 80 | + |
| 81 | +Let's look at the interaction between Numba, numba extention and cppyy. |
| 82 | + |
| 83 | +{: style="{{ image_style }}"} |
| 84 | + |
| 85 | +Numba analyzes a Python code and when it encounters cppyy types, it queries |
| 86 | +the cppyy’s pre-registered `numba_ext` module for the type information. If |
| 87 | +`numba_ext` encounters a type that it hasn't seen before, it queries cppyy’s |
| 88 | +new reflection API. This helps generate the necessary typing classes and |
| 89 | +lowering methods. Each core language construct (namespaces, classes, free |
| 90 | +functions, methods, data members, etc.) has its own implementation. This |
| 91 | +process provides Numba with the information needed to convert the function |
| 92 | +call to LLVM IR. |
| 93 | + |
| 94 | +### Benchmarks |
| 95 | + |
| 96 | +The following benchmarks were executed on a 3.1GHz Intel NUC Core i7-8809G CPU |
| 97 | +with 32G RAM. |
| 98 | + |
| 99 | +<br /> |
| 100 | + |
| 101 | +| Benchmark Case | Cppyy JIT time (s) | Numba JIT time (s) | Hot run time (s) | Python run time (s) | Speedup | |
| 102 | +|----------------------------|---------------------|---------------------|-------------------|----------------------|----------| |
| 103 | +| Function w/o args | 1.72e-03 | 3.33e-01 | 3.58e-06 | 1.73e-05 | 4.84× | |
| 104 | +| Overloaded functions | 1.05e-03 | 1.35e-01 | 4.51e-06 | 3.47e-05 | 7.70× | |
| 105 | +| Templated free functions | 8.92e-04 | 1.45e-01 | 3.48e-06 | 7.18e-05 | 20.66× | |
| 106 | +| Class data members | 1.43e-06 | 1.33e-01 | 5.87e-06 | 1.82e-05 | 3.10× | |
| 107 | +| Class methods | 2.16e-03 | 1.39e-01 | 6.06e-06 | 1.43e-05 | 2.36× | |
| 108 | + |
| 109 | +<br /> |
| 110 | + |
| 111 | +Where, |
| 112 | +- Numba JIT time: The execution time of Numba JITed functions with cppyy objects against their Python counterparts (to obtain the time taken by Numba to JIT the function). |
| 113 | +- cppyy JIT time: The time taken by cppyy to create the typing information and possibly to perform lookups and instantiate templates. |
| 114 | +- Hot run time: The time taken to execute the function after it has been JITed. |
| 115 | +- Python run time: The time taken to execute the equivalent Python function. |
| 116 | +- Speedup: Compares the Hot run time to Python run time. |
| 117 | + |
| 118 | +For more technical details, please view the paper: [Efficient and Accurate Automatic Python Bindings with cppyy & Cling] |
| 119 | + |
| 120 | + |
| 121 | + |
| 122 | +[Efficient and Accurate Automatic Python Bindings with cppyy & Cling]: https://arxiv.org/abs/2304.02712 |
0 commit comments