glusterd snapshot (ZFS): `snapshot delete` reports success when the backend `zfs destroy` fails, leaving an untracked ZFS snapshot

## Environment / provenance

- **Component:** glusterd — snapshot ZFS provider.
- **Source state:** present on `devel` at commit `ae1d69672f` (the line numbers below are at that commit),
  unchanged since introduction: the provider's `zfs clone`/`zfs destroy` lines come from `8a17e531c05` (the
  ZFS provider, PR #2855, 2021-12-25); the generic swallow in `glusterd_snapshot_remove` from the snapshot
  pluginization commit `1787def0ab1` (2021-12-17).
- **Reproduced on:** a shipping GlusterFS **11.2** + ZFS **2.4.2** system (single-brick volume). The defect
  is per-brick (the provider acts per brick; the swallow is in the generic per-snapshot remove path), so a
  multi-brick volume would orphan one backend snapshot per affected brick — not separately reproduced here.

## Summary

`snapshot clone` (and `snapshot restore`) create the backend snapshot's clone with `zfs clone` and never
`zfs promote` it, so the clone keeps `origin == <snapshot>` — a normal ZFS dependency. A later
`snapshot delete` of that source snapshot runs a plain `zfs destroy <snapshot>` (no `-R`), which ZFS
correctly refuses while a dependent clone exists, returning a nonzero status. glusterd logs that failure at
ERROR but then resets the transaction return code to 0 during directory cleanup, so the delete reports
success: the CLI prints `snap removed successfully` (exit 0) and glusterd drops the snapshot object, while
the backend ZFS snapshot remains — now untracked by glusterd.

The defect is the silent-to-the-operator success on a real backend failure. (It is logged at ERROR in
`glusterd.log`, but the CLI/RPC result is success.) The dependent clone itself is a normal, glusterd-tracked
clone volume; the leaked object is the backend ZFS snapshot it holds.

## Affected code

Both clone and restore build the clone with `zfs clone`, never `zfs promote`
(`xlators/mgmt/glusterd/src/snapshot/glusterd-zfs-snapshot.c`):

```c
/* snapshot clone   -> .clone   -> glusterd_zfs_snapshot_create_or_clone(): */
    runner_add_args(&runner, ZFS_COMMAND, "clone", snap_device, clone_device, NULL);  /* line 179 */
/* snapshot restore -> .restore -> glusterd_zfs_snapshot_restore():          */
    runner_add_args(&runner, ZFS_COMMAND, "clone", snap_device, clone_device, NULL);  /* line 430 */
```

Remove is a plain `zfs destroy`, no `-R` (its only removal primitive — there is no `zfs promote`/`-R`
anywhere in the provider), `glusterd_zfs_snapshot_remove`:

```c
    runner_add_args(&runner, ZFS_COMMAND, "destroy", snap_device, NULL);  /* line 332 */
    ret = runner_run(&runner);
    if (ret) { gf_msg(... GD_MSG_SNAP_REMOVE_FAIL ...); goto out; }       /* returns nonzero */
```

The nonzero return is dropped by the caller, `glusterd_snapshot_remove` (`glusterd-snapshot.c`):

```c
    glusterd_snapshot_plugin_by_name(snap_vol->snap_plugin, &snap_ops);
    ret = snap_ops->remove(brickinfo, ...);      /* ~line 2600 */
    if (ret) {
        gf_msg(... GD_MSG_SNAP_REMOVE_FAIL ...);  /* ~line 2603: logged at ERROR — but no `goto out` */
    }
    ret = sys_rmdir(snap_path);                   /* ~2612: overwrites the failure with the cleanup result … */
    if (ret) { ret = 0; ... }                     /* ~2616/2637: … and forces ret = 0 even if rmdir failed */
out:
    return ret;                                   /* 0 — the backend failure has been overwritten */
```

So the failure is logged but execution falls through; `ret` is then reassigned by `sys_rmdir` (and forced to
0 if that also fails), so the original backend-removal failure is unconditionally overwritten and the
transaction returns success. (`snap_ops->remove` dispatches through the plugin vtable — selected in
`glusterd-snapshot-utils.c` — so this caller is shared by the LVM backend too.)

## Root cause

A ZFS snapshot with a dependent clone genuinely cannot be removed by a plain `zfs destroy` — that is
correct ZFS behaviour, because the clone shares the snapshot's blocks. The bug is not that the destroy
fails; it is that `glusterd_snapshot_remove` does not propagate that failure (it logs it, falls through, and
a later cleanup step resets the return code to 0), so a recoverable backend failure is reported to the
operator as success and the snapshot object is dropped.

## Reproduction

Single-brick volume on a ZFS dataset (so the provider selects `zfs_snap_ops`):

```sh
gluster volume create gv0 <host>:/<pool-mount>/gv0/brick force && gluster volume start gv0
gluster snapshot create snap1 gv0 no-timestamp     # -> zfs snapshot pool/gv0@<snapid>
gluster snapshot activate snap1
gluster snapshot clone clonevol snap1              # -> zfs clone (line 179): clonevol's brick.origin = snap1's snapshot
gluster snapshot delete snap1                      # reports success (see below)
```

Observed output (GlusterFS 11.2, ZFS 2.4.2; identifiers shortened):

```
$ gluster snapshot delete snap1 ; echo "exit=$?"
snapshot delete: snap1: snap removed successfully
exit=0

$ gluster snapshot list
No snapshots present                                 # glusterd no longer tracks snap1

$ zfs list -t snapshot
pool/gv0@<snapid>                                    # the snapshot is still here

$ zfs get -H -o value origin pool/gv0/<cloneid>      # clonevol's brick still depends on it:
pool/gv0@<snapid>
$ zfs get -H -o value clones pool/gv0@<snapid>
pool/gv0/<cloneid>

$ zfs destroy pool/gv0@<snapid>                      # the provider's plain destroy still can't remove it:
cannot destroy 'pool/gv0@<snapid>': snapshot has dependent clones
use '-R' to destroy the following datasets:
pool/gv0/<cloneid>
```

The matching glusterd.log lines for that delete (logged at ERROR while the CLI returned success):

```
E [MSGID: 106043] [glusterd-zfs-snapshot.c:337:glusterd_zfs_snapshot_remove] 0-management: removing snapshot of the brick (<brick-path>) of snapshot snap1 failed
E [MSGID: 106043] [glusterd-snapshot.c:2603:glusterd_snapshot_remove] 0-management: Failed to remove the snapshot <brick-path>/.zfs/snapshot/<snapid>/brick (snap1)
```

`snapshot restore` is also implemented via an un-promoted `zfs clone` (line 430), creating the same kind of
snapshot↔clone dependency; only the `snapshot clone` path above was reproduced end-to-end (it is the
cleanest trigger).

## Impact

A `snapshot delete` that did not remove the backend snapshot is reported to the operator as success, and
glusterd forgets the snapshot. The ZFS snapshot is left behind — no longer removable by the provider's plain
`zfs destroy` while the clone depends on it, and no longer referenced by glusterd, so nothing will ever
retry it. It accumulates whenever a snapshot is cloned-from (or restored-from) and that source snapshot is
later deleted while the clone still exists. This is not data loss (no `-R` is issued), but it is a correctness
bug (success reported for a failed operation) and a steady consumer of pool space that is invisible to
`gluster snapshot list`.

## Suggested fix

The delete should fail with a clear, actionable error instead of reporting success — e.g. *"cannot delete
snapshot snap1: it is in use by clone volume `clonevol`; delete the clone first."* The dependent clone is
discoverable via `zfs get -H -o value clones <snap_device>`.

The clean place to reject this is the **prevalidate** phase. `glusterd_snapshot_remove_prevalidate()`
(dispatched for delete by `glusterd_snapshot_prevalidate`) already runs before the op commits; having the
ZFS provider expose a "has dependent clones" check there lets the delete be refused before any state
changes.

This phase choice matters: today the failure surfaces only at commit time, *after*
`glusterd_snapshot_remove_commit` has already logged "marked snap … for decommission" (see the log above).
Merely propagating the error from `glusterd_snapshot_remove` (which currently logs it, then has `ret`
overwritten by the `sys_rmdir` cleanup) would fail the op but could leave a partially-decommissioned
snapshot. Rejecting in prevalidate avoids that half-state; a commit-phase guard, if also wanted, needs
proper rollback. Exact placement is the implementer's call.

Either way, the existing tolerant behaviour for a backend snapshot that is *already gone* (where continuing
is correct) should be preserved — only the has-dependent-clone case should be refused. A blanket change to
stop the caller overwriting `ret` is **not** proposed: that path is shared with the LVM backend and the
auto-delete/GC callers and would need a wider audit.

### Why not `zfs promote`?

The standard ZFS way to drop a snapshot that has dependents is `zfs promote`, but it is not a safe fix here
(observed on ZFS 2.4.2): `zfs promote <clone>` does not free the snapshot — it relocates it into the clone's
namespace and swaps roles, so the original snapshot name no longer exists (a follow-up
`zfs destroy <snap_device>` then exits 1, "could not find any snapshots to destroy"), and it makes the
*live* brick a dependent clone of the snapshot-clone volume. After that, a `zfs destroy -R` of the clone
volume would also destroy the live brick. Failing the delete (above) avoids creating that hidden
dependency.

## Notes

- A PR for the provider-scoped fix can follow; happy to align on the preferred shape first.
- A formal model of the snapshot lifecycle and the full annotated testbed transcript (including the
  `zfs promote` relocation/cascade observations) are available on request.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

glusterd snapshot (ZFS): `snapshot delete` reports success when the backend `zfs destroy` fails, leaving an untracked ZFS snapshot #4689

Environment / provenance

Summary

Affected code

Root cause

Reproduction

Impact

Suggested fix

Why not `zfs promote`?

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

glusterd snapshot (ZFS): snapshot delete reports success when the backend zfs destroy fails, leaving an untracked ZFS snapshot #4689

Description

Environment / provenance

Summary

Affected code

Root cause

Reproduction

Impact

Suggested fix

Why not zfs promote?

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

glusterd snapshot (ZFS): `snapshot delete` reports success when the backend `zfs destroy` fails, leaving an untracked ZFS snapshot #4689

Why not `zfs promote`?