pyHPC
diff --git a/‎README.md‎
Lines changed: 4 additions & 0 deletions b/‎README.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎notebooks/9_1_nvmath-python_interop.ipynb‎
Lines changed: 152 additions & 0 deletions b/‎notebooks/9_1_nvmath-python_interop.ipynb‎
Lines changed: 152 additions & 0 deletions
@@ -10,3 +10,7 @@
 | 6 | Asynchrony: Power Iteration | [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pyHPC/pyhpc-tutorial/blob/main/notebooks/6__asynchrony__power_iteration.ipynb) | [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pyHPC/pyhpc-tutorial/blob/main/notebooks/6__asynchrony__power_iteration__SOLUTION.ipynb) |
 | 7 | Kernel Authoring: Copy | [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pyHPC/pyhpc-tutorial/blob/main/notebooks/7__kernel_authoring__copy.ipynb) | [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pyHPC/pyhpc-tutorial/blob/main/notebooks/7__kernel_authoring__copy__SOLUTION.ipynb) |
 | 8 | MPI | [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pyHPC/pyhpc-tutorial/blob/main/notebooks/8__mpi.ipynb) |  |
+| 9.1 | `nvmath-python`: Interoperability with CPU and GPU tensor libraries | [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/samaid/pyhpc-tutorial/blob/main/notebooks/9_1_nvmath-python_interop.ipynb) |  |
+| 9.2 | `nvmath-python`: Kernel fusion | [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/samaid/pyhpc-tutorial/blob/main/notebooks/9_2_nvmath-python_kernel_fusion.ipynb) |  |
+| 9.3 | `nvmath-python` stateful APIs: Amortizing task preparation costs | [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/samaid/pyhpc-tutorial/blob/main/notebooks/9_3_nvmath-python_stateful_apis.ipynb) |  |
+| 9.4 | `nvmath-python` scaling to many GPUs | [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/samaid/pyhpc-tutorial/blob/main/notebooks/9_4_nvmath-python_scaling.ipynb) |  |
@@ -0,0 +1,152 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/samaid/pyhpc-tutorial/blob/main/notebooks/9_1_nvmath-python_interop.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "7b236cf1",
+      "metadata": {
+        "id": "7b236cf1"
+      },
+      "source": [
+        "# 9.1. `nvmath-python`: Interoperability with CPU and GPU tensor libraries\n",
+        "The goal of this exercise is to demonstrate how easy it is to plug `nvmath-python` into existing projects that rely on popular CPU or GPU array libraries, such as NumPy, CuPy, and PyTorch, or how easy it is to start a new project where `nvmath-python` is used alongside array libraries."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "e38c312d",
+      "metadata": {
+        "id": "e38c312d"
+      },
+      "source": [
+        "### Pure CuPy implementation\n",
+        "\n",
+        "This example demonstrates basic matrix multiplication of CuPy 2D arrays using `matmul`:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "b796dc7e",
+      "metadata": {
+        "id": "b796dc7e"
+      },
+      "outputs": [],
+      "source": [
+        "import cupy as cp\n",
+        "\n",
+        "# Prepare sample input data for matrix matmul\n",
+        "n, m, k = 2000, 4000, 5000\n",
+        "a = cp.random.rand(n, k)\n",
+        "b = cp.random.rand(k, m)\n",
+        "\n",
+        "# Perform matrix multiplication\n",
+        "result = cp.matmul(a, b)\n",
+        "\n",
+        "# Print the result\n",
+        "print(result)\n",
+        "\n",
+        "# Print CUDA device for each array\n",
+        "print(a.device)\n",
+        "print(b.device)\n",
+        "print(result.device)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "7528a6f8",
+      "metadata": {
+        "id": "7528a6f8"
+      },
+      "source": [
+        "### Using `nvmath-python` alongside CuPy\n",
+        "\n",
+        "This is a slight modification of the above example, where matrix multiplications is done using corresponding `nvmath-python` implementation.\n",
+        "\n",
+        "Note that `nvmath-python` supports multiple frameworks, including CuPy. It uses framework's memory pool and the current stream for seamless integration. The result of each operation is a tensor of the same framework that was used to pass the inputs. It is also located on the same device as the inputs."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "311ee2e9",
+      "metadata": {
+        "id": "311ee2e9"
+      },
+      "outputs": [],
+      "source": [
+        "# The same matrix multiplication as in the previous example but using nvmath-python\n",
+        "import nvmath\n",
+        "\n",
+        "# Perform matrix multiplication\n",
+        "result = nvmath.linalg.advanced.matmul(a, b)\n",
+        "\n",
+        "# Print the result\n",
+        "print(result)\n",
+        "\n",
+        "# Print CUDA device for each array\n",
+        "print(a.device)\n",
+        "print(b.device)\n",
+        "print(result.device)\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "85b2ae1b",
+      "metadata": {
+        "id": "85b2ae1b"
+      },
+      "source": [
+        "As we can see, the code looks essentially the same. If one measures the performance of above implementations, it will be nearly identical.\n",
+        "\n",
+        "This is because CuPy and `nvmath-python` (as well as PyTorch) all use CUDA-X Math Libraries as the engine. It is up to a user, which library to choose for solving the above matrix multiplication problem.\n",
+        "\n",
+        "In the next examples we will demonstrate a few examples, where `nvmath-python` may become essential in reaching peak levels of performance."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "bf34d34d",
+      "metadata": {
+        "id": "bf34d34d"
+      },
+      "outputs": [],
+      "source": []
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "nersc-nvmath",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.13.5"
+    },
+    "colab": {
+      "provenance": [],
+      "include_colab_link": true
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 5
+}