ArmDeveloperEcosystem · jasonrandrews · Jun 13, 2025 · Jun 11, 2025 · Jun 12, 2025 · Jun 12, 2025
diff --git a/.../learning-paths/servers-and-cloud-computing/bitmap_scan_sve2/01-introduction.md b/.../learning-paths/servers-and-cloud-computing/bitmap_scan_sve2/01-introduction.md
@@ -0,0 +1,66 @@
+---
+# User change
+title: "Optimize bitmap scanning in databases with SVE and NEON on Arm servers"
+
+weight: 2
+
+layout: "learningpathall"
+---
+## Overview
+
+Bitmap scanning is a core operation in many database systems. It's essential for powering fast filtering in bitmap indexes, Bloom filters, and column filters. However, these scans can become performance bottlenecks in complex analytical queries.
+
+In this Learning Path, you’ll learn how to accelerate bitmap scanning using Arm’s vector processing technologies - NEON and SVE - on Neoverse V2–based servers like AWS Graviton4. 
+
+Specifically, you will:
+
+* Explore how to use SVE instructions on Arm Neoverse V2–based servers like AWS Graviton4 to optimize bitmap scanning
+* Compare scalar, NEON, and SVE implementations to demonstrate the performance benefits of specialized vector instructions
+
+## What is bitmap scanning in databases?
+
+Bitmap scanning involves searching through a bit vector to find positions where bits are set (`1`) or unset (`0`). 
+
+In database systems, bitmaps are commonly used to represent:
+
+* **Bitmap indexes**: each bit represents whether a row satisfies a particular condition
+* **Bloom filters**: probabilistic data structures used to test set membership
+* **Column filters**: bit vectors indicating which rows match certain predicates
+
+The operation of scanning a bitmap to find set bits is often in the critical path of query execution, making it a prime candidate for optimization.
+
+## The evolution of vector processing for bitmap scanning
+
+Here's how vector processing has evolved to improve bitmap scanning performance:
+
+* **Generic scalar processing**: traditional bit-by-bit processing with conditional branches
+* **Optimized scalar processing**: byte-level skipping to avoid processing empty bytes
+* **NEON**: fixed-width 128-bit SIMD processing with vector operations
+* **SVE**: scalable vector processing with predication and specialized instructions like MATCH 
+
+## Set up your Arm development environment
+
+To follow this Learning Path, you will need:
+
+* An AWS Graviton4 instance running `Ubuntu 24.04`. 
+* A GCC compiler with SVE support
+
+First, install the required development tools:
+
+```bash
+sudo apt-get update
+sudo apt-get install -y build-essential gcc g++
+```
+{{% notice Tip %}}
+An effective way to achieve optimal performance on Arm is not only through optimal flag usage, but also by using the most recent compiler version. For best performance, use the latest available GCC version with SVE support. This Learning Path was tested with GCC 13, the default on Ubuntu 24.04. Newer versions should also work.
+{{% /notice %}}
+
+
+Create a directory for your implementations:
+```bash
+mkdir -p bitmap_scan
+cd bitmap_scan
+```
+
+## Next up: build the bitmap scanning foundation
+With your development environment set up, you're ready to dive into the core of bitmap scanning. In the next section, you’ll define a minimal bitmap data structure and implement utility functions to set, clear, and inspect individual bits.
diff --git a/...-paths/servers-and-cloud-computing/bitmap_scan_sve2/02-bitmap-data-structure.md b/...-paths/servers-and-cloud-computing/bitmap_scan_sve2/02-bitmap-data-structure.md
@@ -0,0 +1,88 @@
+---
+# User change
+title: "Build and manage a bit vector in C"
+
+weight: 3
+
+layout: "learningpathall"
+
+---
+## Bitmap data structure
+
+Now let's define a simple bitmap data structure that serves as the foundation for the different implementations. The bitmap implementation uses a simple structure with three key components:
+   - A byte array to store the actual bits
+   - Tracking of the physical size (bytes)
+   - Tracking of the logical size (bits)
+
+For testing the different implementations in this Learning Path, you also need functions to generate and analyze the bitmaps.
+
+Use a file editor of your choice and then copy the code below into `bitvector_scan_benchmark.c`:
+
+```c
+// Define a simple bit vector structure
+typedef struct {
+    uint8_t* data;
+    size_t size_bytes;
+    size_t size_bits;
+} bitvector_t;
+
+// Create a new bit vector
+bitvector_t* bitvector_create(size_t size_bits) {
+    bitvector_t* bv = (bitvector_t*)malloc(sizeof(bitvector_t));
+    bv->size_bits = size_bits;
+    bv->size_bytes = (size_bits + 7) / 8;
+    bv->data = (uint8_t*)calloc(bv->size_bytes, 1);
+    return bv;
+}
+
+// Free bit vector resources
+void bitvector_free(bitvector_t* bv) {
+    free(bv->data);
+    free(bv);
+}
+
+// Set a bit in the bit vector
+void bitvector_set_bit(bitvector_t* bv, size_t pos) {
+    if (pos < bv->size_bits) {
+        bv->data[pos / 8] |= (1 << (pos % 8));
+    }
+}
+
+// Get a bit from the bit vector
+bool bitvector_get_bit(bitvector_t* bv, size_t pos) {
+    if (pos < bv->size_bits) {
+        return (bv->data[pos / 8] & (1 << (pos % 8))) != 0;
+    }
+    return false;
+}
+
+// Generate a bit vector with specified density
+bitvector_t* generate_bitvector(size_t size_bits, double density) {
+    bitvector_t* bv = bitvector_create(size_bits);
+
+    // Set bits according to density
+    size_t num_bits_to_set = (size_t)(size_bits * density);
+
+    for (size_t i = 0; i < num_bits_to_set; i++) {
+        size_t pos = rand() % size_bits;
+        bitvector_set_bit(bv, pos);
+    }
+
+    return bv;
+}
+
+// Count set bits in the bit vector
+size_t bitvector_count_scalar(bitvector_t* bv) {
+    size_t count = 0;
+    for (size_t i = 0; i < bv->size_bits; i++) {
+        if (bitvector_get_bit(bv, i)) {
+            count++;
+        }
+    }
+    return count;
+}
+```
+
+## Next up: implement and benchmark your first scalar bitmap scan
+
+With your bit vector infrastructure in place, you're now ready to scan it for set bits—the core operation that underpins all bitmap-based filters in database systems.
diff --git a/...paths/servers-and-cloud-computing/bitmap_scan_sve2/03-scalar-implementations.md b/...paths/servers-and-cloud-computing/bitmap_scan_sve2/03-scalar-implementations.md
@@ -0,0 +1,89 @@
+---
+# User change
+title: "Implement scalar bitmap scanning in C"
+
+weight: 4
+
+layout: "learningpathall"
+
+
+---
+## Bitmap scanning implementations
+
+Bitmap scanning is a fundamental operation in performance-critical systems such as databases, search engines, and filtering pipelines. It involves identifying the positions of set bits (`1`s) in a bit vector, which is often used to represent filtered rows, bitmap indexes, or membership flags. 
+
+In this section, you'll implement multiple scalar approaches to bitmap scanning in C, starting with a simple per-bit baseline, followed by an optimized version that reduces overhead for sparse data.
+
+Now, let’s walk through the scalar versions of this operation that locate all set bit positions.
+
+### Generic scalar implementation
+
+This is the most straightforward implementation, checking each bit individually. It serves as the baseline for comparison against the other implementations to follow. 
+
+Copy the code below into the same file:
+
+```c
+// Generic scalar implementation of bit vector scanning (bit-by-bit)
+size_t scan_bitvector_scalar_generic(bitvector_t* bv, uint32_t* result_positions) {
+    size_t result_count = 0;
+
+    for (size_t i = 0; i < bv->size_bits; i++) {
+        if (bitvector_get_bit(bv, i)) {
+            result_positions[result_count++] = i;
+        }
+    }
+
+    return result_count;
+}
+```
+
+You might notice that this generic C implementation processes every bit, even when most bits are not set. It has high per-bit function call overhead and does not take advantage of any vector instructions.
+
+In the following implementations, you can address these inefficiencies with more optimized techniques.
+
+### Optimized scalar implementation
+
+This implementation adds byte-level skipping to avoid processing empty bytes. 
+
+Copy this optimized C scalar implementation code into the same file:
+
+```c
+// Optimized scalar implementation of bit vector scanning (byte-level)
+size_t scan_bitvector_scalar(bitvector_t* bv, uint32_t* result_positions) {
+size_t result_count = 0;
+
+    for (size_t byte_idx = 0; byte_idx < bv->size_bytes; byte_idx++) {
+        uint8_t byte = bv->data[byte_idx];
+
+        // Skip empty bytes
+        if (byte == 0) {
+            continue;
+        }
+
+        // Process each bit in the byte
+        for (int bit_pos = 0; bit_pos < 8; bit_pos++) {
+            if (byte & (1 << bit_pos)) {
+                size_t global_pos = byte_idx * 8 + bit_pos;
+                if (global_pos < bv->size_bits) {
+                    result_positions[result_count++] = global_pos;
+                }
+            }
+        }
+    }
+
+    return result_count;
+}
+```
+Instead of iterating through each bit individually, this implementation processes one byte (8 bits) at a time. The main optimization over the previous scalar implementation is checking if an entire byte is zero and skipping it entirely. For sparse bitmaps, this can dramatically reduce the number of bit checks.
+
+## Next up: accelerate bitmap scanning with NEON and SVE
+
+You’ve now implemented two scalar scanning routines:
+
+* A generic per-bit loop for correctness and simplicity
+
+* An optimized scalar version that improves performance using byte-level skipping
+
+These provide a solid foundation and performance baseline—but scalar methods can only take you so far. To unlock real throughput gains, it’s time to leverage SIMD (Single Instruction, Multiple Data) execution.
+
+In the next section, you’ll explore how to use Arm NEON and SVE vector instructions to accelerate bitmap scanning. These approaches will process multiple bytes at once and significantly outperform scalar loops—especially on modern Arm-based CPUs like AWS Graviton4.