Skip to content

Upgrade LeilFS FSAL to Ganesha V9.15 and add NFS-over-RDMA test#887

Draft
ralcolea wants to merge 2 commits into
devfrom
nfs-ganesha-v9.15
Draft

Upgrade LeilFS FSAL to Ganesha V9.15 and add NFS-over-RDMA test#887
ralcolea wants to merge 2 commits into
devfrom
nfs-ganesha-v9.15

Conversation

@ralcolea

Copy link
Copy Markdown
Contributor

Summary

Upgrades the LeilFS FSAL from Ganesha V9.2 to V9.15 (ntirpc 7.2) and adds an integration test validating the FSAL over NFS-over-RDMA. V9.15 brings upstream fixes and a far more mature NFS/RDMA implementation, so this PR also enables RDMA in the Ganesha builds.

Two commits:

  1. feat(nfs-ganesha): upgrade FSAL to Ganesha V9.15 — bump the pinned Ganesha (cmake, Dockerfile, CI), enable -DUSE_NFS_RDMA=ON, and adapt the FSAL to the new API: readlink_() now takes a utf8string instead of a gsh_buffdesc.

  2. test(nfs-ganesha): add NFS/RDMA integration test — a GaneshaTests case that mounts a LeilFS export over RPC-over-RDMA and does a read/write roundtrip, plus the test-machine dependencies and sudoers it needs.

Motivation

RDMA is a transport-layer feature in Ganesha/libntirpc and is FSAL-independent, so a successful RDMA mount + I/O of a LeilFS export proves the LeilFS FSAL works over RDMA. V9.15 is the first stable line where the NFS/RDMA data path is usable (V9.2 brought up the listener but file-system ops were incomplete).

RDMA test design

  • Brings up a soft-RDMA device — SoftiWARP (siw), falling back to SoftRoCE (rdma_rxe) — bound to a real NIC, and mounts via its routable IP. A loopback device (lo, 127.0.0.1) opens the port but does not deliver the RPC data path for a same-host mount.
  • For siw it starts the iWARP port mapper (iwpmd) when needed, and stops it on cleanup only if the test launched it.

CI

  • Ganesha is now built with -DUSE_NFS_RDMA=ON (+ librdmacm-dev / libibverbs-dev).
  • New step loads the soft-RDMA kernel modules (linux-modules-extra) and logs whether the RDMA path will run or skip on this runner.

Testing

  • Local: RDMA test passes (mount over siw + 4 MiB read/write roundtrip); the existing non-RDMA Ganesha suite passes on the V9.15 + RDMA build.

  • CI: check the "Enable soft-RDMA kernel modules" step — it reports whether the GH runner kernel ships siw/rdma_rxe. If not, the RDMA test fails.

Signed-off-by: Crash <crash@leil.io>

ralcolea added 2 commits June 12, 2026 16:55
Bump the pinned Ganesha from V9.2 to V9.15 (upstream adb062b, ntirpc
7.2) and adapt the SaunaFS FSAL to its API.

Upstream changed the readlink FSAL method: readlink_() now receives a
utf8string instead of a gsh_buffdesc, so it is updated to fill the
result via copy_into_utf8string().

V9.15 also brings upstream fixes and a more mature NFS-over-RDMA
implementation; RDMA is enabled in the Ganesha build here. An
integration test for it follows in a separate commit.

Signed-off-by: Crash <crash@leil.io>
Add a GaneshaTests case that validates the SaunaFS FSAL over NFSv4
RPC-over-RDMA. RDMA is a transport-layer feature in Ganesha/libntirpc
and is FSAL-independent, so a successful RDMA mount of a SaunaFS export
plus a file read/write roundtrip proves the FSAL works over RDMA.

The test brings up a soft-RDMA device (SoftiWARP, falling back to
SoftRoCE) bound to a real NIC and mounts via its routable IP; a loopback
device opens the port but does not deliver the RPC data path for a
same-host mount. For siw it starts the iWARP port mapper (iwpmd) when
needed and stops it on cleanup only if the test launched it.

Supporting changes:
- setup_machine.sh: sudoers for modprobe (siw/rdma_rxe/rpcrdma), rdma
  link add/delete, and iwpmd start/stop.
- install-packages.sh: rdma-core + ibverbs-utils on apt, rdma-core +
  libibverbs-utils on dnf (iwpmd, librdmacm, ibv_devices/ibv_devinfo).

Signed-off-by: Crash <crash@leil.io>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates NFS-Ganesha to version 9.15, refactors the readlink_ function in handle.c to use utf8string, and introduces automated testing for NFSv4 over RDMA. Feedback on the new test script highlights several issues, including a missing bash shebang, a potential command failure when no default network route is available, and an overwritten EXIT trap that could leak the iwpmd daemon if the script fails early. Suggestions are provided to define the cleanup function earlier and handle empty network devices gracefully.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +37 to +42
rdma_netdev=$(ip route show default 2>/dev/null | awk '{print $5; exit}')
server_ip=$(ip -o -4 addr show "${rdma_netdev}" 2>/dev/null | awk '{print $4}' | cut -d/ -f1 | head -1)
if [[ -z ${rdma_netdev} || -z ${server_ip} ]]; then
rdma_netdev=lo
server_ip=127.0.0.1
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If there is no default route (which is common in isolated CI environments or containers), rdma_netdev will be empty. On line 38, executing ip -o -4 addr show "" will fail with exit code 1. Under set -e (or if the test runner enables set -e / pipefail), this failure will cause the script to terminate immediately on line 38, preventing it from ever reaching the fallback logic on lines 39-42.

To prevent this, only query the IP address if rdma_netdev is non-empty. Additionally, to maintain consistency with the existing network address naming conventions in this module, use 'localhost' instead of '127.0.0.1' as the fallback server IP.

Suggested change
rdma_netdev=$(ip route show default 2>/dev/null | awk '{print $5; exit}')
server_ip=$(ip -o -4 addr show "${rdma_netdev}" 2>/dev/null | awk '{print $4}' | cut -d/ -f1 | head -1)
if [[ -z ${rdma_netdev} || -z ${server_ip} ]]; then
rdma_netdev=lo
server_ip=127.0.0.1
fi
rdma_netdev=$(ip route show default 2>/dev/null | awk '{print $5; exit}')
if [[ -n ${rdma_netdev} ]]; then
server_ip=$(ip -o -4 addr show "${rdma_netdev}" 2>/dev/null | awk '{print $4}' | cut -d/ -f1 | head -1)
else
server_ip=""
fi
if [[ -z ${rdma_netdev} || -z ${server_ip} ]]; then
rdma_netdev=lo
server_ip=localhost
fi
References
  1. Maintain consistency with existing network address naming conventions (e.g., using 'localhost') within a specific test module, even if alternative forms (e.g., '127.0.0.1') are generally considered more reliable, unless a global refactoring is intended.

echo "SKIP: kernel has no SoftiWARP (siw) or SoftRoCE (rdma_rxe) support."
exit 0
fi
trap 'sudo rdma link delete "${rdma_dev}" 2>/dev/null || true' EXIT

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The script registers an EXIT trap on line 55 to delete the RDMA link. However, on line 96, it overwrites this trap with trap test_error_cleanup EXIT.

In Bash, registering a new trap on the same signal (EXIT) completely overwrites the previous one. If the script exits or fails between starting iwpmd (line 64) and setting the new trap (line 96), the iwpmd daemon will be left running in the background as root.

To fix this, define test_error_cleanup early in the script (e.g., right after started_iwpmd=0) and register it as the single EXIT trap. This ensures all resources (both the RDMA link and iwpmd) are always cleaned up safely regardless of when the script exits.

Suggested change
trap 'sudo rdma link delete "${rdma_dev}" 2>/dev/null || true' EXIT
test_error_cleanup() {
set +e
if [[ -n "${mountpoint_path}" ]] && mountpoint -q "${mountpoint_path}"; then
sudo umount -l "${mountpoint_path}"
fi
sudo pkill -9 ganesha.nfsd 2>/dev/null
sudo rdma link delete "${rdma_dev}" 2>/dev/null
if [[ ${started_iwpmd:-0} == 1 ]]; then
sudo pkill -x iwpmd 2>/dev/null
fi
}
trap test_error_cleanup EXIT

@@ -0,0 +1,174 @@
# Verify the SaunaFS FSAL can serve NFSv4 over RPC-over-RDMA (NFS/RDMA).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The script is missing a shebang (#!/usr/bin/env bash) at the top of the file. Without it, if the script is executed directly, the system might default to /bin/sh, which does not support the bash-specific syntax used throughout this script (such as [[ ... ]] double brackets, arithmetic evaluation (( ... )), and local variables), leading to syntax errors and test failures.

Additionally, it is highly recommended to enable set -eo pipefail to ensure that any unexpected command failures are caught immediately rather than silently ignored.

Suggested change
# Verify the SaunaFS FSAL can serve NFSv4 over RPC-over-RDMA (NFS/RDMA).
#!/usr/bin/env bash
# Verify the SaunaFS FSAL can serve NFSv4 over RPC-over-RDMA (NFS/RDMA).
set -eo pipefail

Comment on lines +85 to +96
test_error_cleanup() {
set +e
if mountpoint -q "${mountpoint_path}"; then
sudo umount -l "${mountpoint_path}"
fi
sudo pkill -9 ganesha.nfsd
sudo rdma link delete "${rdma_dev}" 2>/dev/null
if [[ ${started_iwpmd:-0} == 1 ]]; then
sudo pkill -x iwpmd 2>/dev/null
fi
}
trap test_error_cleanup EXIT

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since test_error_cleanup is now defined earlier to prevent resource leaks during setup, this duplicate definition and trap registration should be removed.

Suggested change
test_error_cleanup() {
set +e
if mountpoint -q "${mountpoint_path}"; then
sudo umount -l "${mountpoint_path}"
fi
sudo pkill -9 ganesha.nfsd
sudo rdma link delete "${rdma_dev}" 2>/dev/null
if [[ ${started_iwpmd:-0} == 1 ]]; then
sudo pkill -x iwpmd 2>/dev/null
fi
}
trap test_error_cleanup EXIT
# test_error_cleanup is defined earlier to prevent resource leaks during setup

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades the in-tree NFS-Ganesha FSAL integration from Ganesha v9.2 to v9.15 (ntirpc 7.2), enables building Ganesha with NFS-over-RDMA support, and adds an integration test that mounts and exercises a SaunaFS export over RPC-over-RDMA.

Changes:

  • Bump pinned NFS-Ganesha version to v9.15 across CMake, CI workflow, and the Ganesha build Dockerfile, and enable -DUSE_NFS_RDMA=ON with required RDMA build deps.
  • Update the FSAL readlink_() implementation to the newer Ganesha API (now using utf8string).
  • Add an RDMA mount + read/write roundtrip integration test plus machine setup/package dependencies to support it.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_suites/GaneshaTests/test_nfs_ganesha_rdma_mount_and_io.sh Adds a new integration test that mounts an export over RDMA and validates I/O.
tests/setup_machine.sh Extends test sudoers setup to allow commands needed for RDMA test execution.
tests/install-packages.sh Installs runtime tools needed by the RDMA test (rdma, ibv_*, iwpmd).
tests/ci_build/ganesha/Dockerfile Builds Ganesha v9.15 with RDMA enabled and includes RDMA build dependencies.
src/nfs-ganesha/handle.c Adapts FSAL readlink_() to the v9.15 API (utf8string).
cmake/Libraries.cmake Updates the external download pin to NFS-Ganesha v9.15.
.github/workflows/run-unit-and-ganesha-tests.yml Updates CI to build/install Ganesha v9.15 with RDMA enabled and attempts to load soft-RDMA modules.

Comment on lines +27 to +30
if ! is_program_installed rdma || ! is_program_installed ibv_devices; then
echo "SKIP: iproute2 'rdma' or ibverbs-utils 'ibv_devices' not installed."
exit 0
fi
Comment thread tests/setup_machine.sh
Comment on lines +187 to +191
saunafstest ALL = NOPASSWD: /usr/sbin/modprobe rpcrdma
saunafstest ALL = NOPASSWD: /usr/bin/rdma link add *
saunafstest ALL = NOPASSWD: /usr/bin/rdma link delete *
saunafstest ALL = NOPASSWD: /usr/sbin/iwpmd
saunafstest ALL = NOPASSWD: /usr/bin/pkill -x iwpmd
Comment thread src/nfs-ganesha/handle.c
Comment on lines 1841 to 1844
* @returns: FSAL status
*/
static fsal_status_t readlink_(struct fsal_obj_handle *objectHandle, struct gsh_buffdesc *buffer,
static fsal_status_t readlink_(struct fsal_obj_handle *objectHandle, utf8string *buffer,
bool refresh) {
Comment on lines +179 to +189
- name: Enable soft-RDMA kernel modules
run: |
# Needed by RDMA tests. SoftiWARP (siw) / SoftRoCE (rdma_rxe) usually ship
# in linux-modules-extra.
sudo apt-get install -y "linux-modules-extra-$(uname -r)" || true
sudo modprobe siw 2>/dev/null || sudo modprobe rdma_rxe 2>/dev/null || true
if modinfo siw >/dev/null 2>&1 || modinfo rdma_rxe >/dev/null 2>&1; then
echo "soft-RDMA module available; RDMA test will run"
else
echo "soft-RDMA module NOT available on this kernel; RDMA test will skip"
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants