Skip to content

Commit def8c48

Browse files
oshogbowcaallanjude
committed
zfs: support force exporting pools
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test #3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Co-Authored-by: Will Andrews <will@firepipe.net> Co-Authored-by: Allan Jude <allan@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461 Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
1 parent b1a260b commit def8c48

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+355
-83
lines changed

cmd/zpool/zpool_main.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1843,7 +1843,7 @@ zpool_do_destroy(int argc, char **argv)
18431843
return (1);
18441844
}
18451845

1846-
if (zpool_disable_datasets(zhp, force) != 0) {
1846+
if (zpool_disable_datasets(zhp, force, FALSE) != 0) {
18471847
(void) fprintf(stderr, gettext("could not destroy '%s': "
18481848
"could not unmount datasets\n"), zpool_get_name(zhp));
18491849
zpool_close(zhp);
@@ -1873,7 +1873,7 @@ zpool_export_one(zpool_handle_t *zhp, void *data)
18731873
{
18741874
export_cbdata_t *cb = data;
18751875

1876-
if (zpool_disable_datasets(zhp, cb->force || cb->hardforce) != 0)
1876+
if (zpool_disable_datasets(zhp, cb->force, cb->hardforce) != 0)
18771877
return (1);
18781878

18791879
/* The history must be logged as part of the export */

include/libzfs.h

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -924,8 +924,16 @@ int zfs_smb_acl_rename(libzfs_handle_t *, char *, char *, char *, char *);
924924
* Enable and disable datasets within a pool by mounting/unmounting and
925925
* sharing/unsharing them.
926926
*/
927-
extern int zpool_enable_datasets(zpool_handle_t *, const char *, int);
928-
extern int zpool_disable_datasets(zpool_handle_t *, boolean_t);
927+
_LIBZFS_H int zpool_enable_datasets(zpool_handle_t *, const char *, int);
928+
_LIBZFS_H int zpool_disable_datasets(zpool_handle_t *, boolean_t, boolean_t);
929+
_LIBZFS_H void zpool_disable_datasets_os(zpool_handle_t *, boolean_t);
930+
_LIBZFS_H void zpool_disable_volume_os(const char *);
931+
932+
/*
933+
* Procedure to inform os that we have started force unmount (linux specific).
934+
*/
935+
_LIBZFS_H void zpool_unmount_mark_hard_force_begin(zpool_handle_t *zhp);
936+
_LIBZFS_H void zpool_unmount_mark_hard_force_end(zpool_handle_t *zhp);
929937

930938
/*
931939
* Parse a features file for -o compatibility

include/os/freebsd/zfs/sys/zfs_znode_impl.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,8 @@ extern minor_t zfsdev_minor_alloc(void);
134134
/* Called on entry to each ZFS vnode and vfs operation */
135135
#define ZFS_ENTER(zfsvfs) ZFS_ENTER_ERROR(zfsvfs, EIO)
136136

137+
#define ZFS_ENTER_UNMOUNTOK ZFS_ENTER
138+
137139
/* Must be called before exiting the vop */
138140
#define ZFS_EXIT(zfsvfs) ZFS_TEARDOWN_EXIT_READ(zfsvfs, FTAG)
139141

include/os/linux/spl/sys/thread.h

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ typedef void (*thread_func_t)(void *);
5656
/* END CSTYLED */
5757

5858
#define thread_signal(t, s) spl_kthread_signal(t, s)
59-
#define thread_exit() __thread_exit()
59+
#define thread_exit() spl_thread_exit()
6060
#define thread_join(t) VERIFY(0)
6161
#define curthread current
6262
#define getcomm() current->comm
@@ -70,6 +70,13 @@ extern struct task_struct *spl_kthread_create(int (*func)(void *),
7070
void *data, const char namefmt[], ...);
7171
extern int spl_kthread_signal(kthread_t *tsk, int sig);
7272

73+
static inline __attribute__((noreturn)) void
74+
spl_thread_exit(void)
75+
{
76+
tsd_exit();
77+
SPL_KTHREAD_COMPLETE_AND_EXIT(NULL, 0);
78+
}
79+
7380
extern proc_t p0;
7481

7582
#ifdef HAVE_SIGINFO

include/sys/dmu.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -594,7 +594,7 @@ void dmu_buf_add_ref(dmu_buf_t *db, void* tag);
594594
boolean_t dmu_buf_try_add_ref(dmu_buf_t *, objset_t *os, uint64_t object,
595595
uint64_t blkid, void *tag);
596596

597-
void dmu_buf_rele(dmu_buf_t *db, void *tag);
597+
void dmu_buf_rele(dmu_buf_t *db, const void *tag);
598598
uint64_t dmu_buf_refcount(dmu_buf_t *db);
599599
uint64_t dmu_buf_user_refcount(dmu_buf_t *db);
600600

include/sys/dmu_recv.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535
#include <sys/spa.h>
3636
#include <sys/objlist.h>
3737

38-
extern const char *recv_clone_name;
38+
extern const char *const recv_clone_name;
3939

4040
typedef struct dmu_recv_cookie {
4141
struct dsl_dataset *drc_ds;

include/sys/dsl_dataset.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -243,7 +243,7 @@ typedef struct dsl_dataset {
243243
kmutex_t ds_sendstream_lock;
244244
list_t ds_sendstreams;
245245

246-
void *ds_receiver; /* really a dmu_recv_cookie_t */
246+
struct dmu_recv_cookie *ds_receiver;
247247

248248
/*
249249
* When in the middle of a resumable receive, tracks how much
@@ -331,10 +331,10 @@ boolean_t dsl_dataset_try_add_ref(struct dsl_pool *dp, dsl_dataset_t *ds,
331331
void *tag);
332332
int dsl_dataset_create_key_mapping(dsl_dataset_t *ds);
333333
int dsl_dataset_hold_obj_flags(struct dsl_pool *dp, uint64_t dsobj,
334-
ds_hold_flags_t flags, void *tag, dsl_dataset_t **);
334+
ds_hold_flags_t flags, const void *tag, dsl_dataset_t **);
335335
void dsl_dataset_remove_key_mapping(dsl_dataset_t *ds);
336336
int dsl_dataset_hold_obj(struct dsl_pool *dp, uint64_t dsobj,
337-
void *tag, dsl_dataset_t **);
337+
const void *tag, dsl_dataset_t **);
338338
void dsl_dataset_rele_flags(dsl_dataset_t *ds, ds_hold_flags_t flags,
339339
void *tag);
340340
void dsl_dataset_rele(dsl_dataset_t *ds, void *tag);

include/sys/dsl_scan.h

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -171,8 +171,12 @@ int dsl_scan_cancel(struct dsl_pool *);
171171
int dsl_scan(struct dsl_pool *, pool_scan_func_t);
172172
void dsl_scan_assess_vdev(struct dsl_pool *dp, vdev_t *vd);
173173
boolean_t dsl_scan_scrubbing(const struct dsl_pool *dp);
174-
int dsl_scrub_set_pause_resume(const struct dsl_pool *dp, pool_scrub_cmd_t cmd);
175-
int dsl_scan_restart_resilver(struct dsl_pool *, uint64_t txg);
174+
boolean_t dsl_errorscrubbing(const struct dsl_pool *dp);
175+
boolean_t dsl_errorscrub_active(dsl_scan_t *scn);
176+
int dsl_scan_restart_resilver(struct dsl_pool *dp, uint64_t txg);
177+
int dsl_scrub_set_pause_resume(const struct dsl_pool *dp,
178+
pool_scrub_cmd_t cmd);
179+
void dsl_errorscrub_sync(struct dsl_pool *, dmu_tx_t *);
176180
boolean_t dsl_scan_resilvering(struct dsl_pool *dp);
177181
boolean_t dsl_scan_resilver_scheduled(struct dsl_pool *dp);
178182
boolean_t dsl_dataset_unstable(struct dsl_dataset *ds);

include/sys/fs/zfs.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1380,6 +1380,8 @@ typedef enum zfs_ioc {
13801380
ZFS_IOC_UNJAIL, /* 0x86 (FreeBSD) */
13811381
ZFS_IOC_SET_BOOTENV, /* 0x87 */
13821382
ZFS_IOC_GET_BOOTENV, /* 0x88 */
1383+
ZFS_IOC_HARD_FORCE_UNMOUNT_BEGIN, /* 0x89 (Linux) */
1384+
ZFS_IOC_HARD_FORCE_UNMOUNT_END, /* 0x8a (Linux) */
13831385
ZFS_IOC_LAST
13841386
} zfs_ioc_t;
13851387

include/sys/spa.h

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -753,6 +753,7 @@ extern int spa_create(const char *pool, nvlist_t *nvroot, nvlist_t *props,
753753
extern int spa_import(char *pool, nvlist_t *config, nvlist_t *props,
754754
uint64_t flags);
755755
extern nvlist_t *spa_tryimport(nvlist_t *tryconfig);
756+
extern int spa_set_pre_export_status(const char *pool, boolean_t status);
756757
extern int spa_destroy(const char *pool);
757758
extern int spa_checkpoint(const char *pool);
758759
extern int spa_checkpoint_discard(const char *pool);
@@ -957,10 +958,12 @@ extern void spa_iostats_trim_add(spa_t *spa, trim_type_t type,
957958
uint64_t extents_skipped, uint64_t bytes_skipped,
958959
uint64_t extents_failed, uint64_t bytes_failed);
959960

960-
/* Config lock handling flags */
961961
typedef enum {
962+
/* Config lock handling flags */
962963
SCL_FLAG_TRYENTER = 1U << 0,
963964
SCL_FLAG_NOSUSPEND = 1U << 1,
965+
/* MMP flag */
966+
SCL_FLAG_MMP = 1U << 2,
964967
} spa_config_flag_t;
965968

966969
extern void spa_import_progress_add(spa_t *spa);
@@ -973,7 +976,8 @@ extern int spa_import_progress_set_state(uint64_t pool_guid,
973976
spa_load_state_t spa_load_state);
974977

975978
/* Pool configuration locks */
976-
extern int spa_config_tryenter(spa_t *spa, int locks, void *tag, krw_t rw);
979+
extern int spa_config_tryenter(spa_t *spa, int locks, const void *tag,
980+
krw_t rw);
977981
extern int spa_config_enter_flags(spa_t *spa, int locks, const void *tag,
978982
krw_t rw, spa_config_flag_t flags);
979983
extern void spa_config_enter(spa_t *spa, int locks, const void *tag, krw_t rw);

include/sys/spa_impl.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,7 @@ struct spa {
245245
list_t spa_evicting_os_list; /* Objsets being evicted. */
246246
kcondvar_t spa_evicting_os_cv; /* Objset Eviction Completion */
247247
kthread_t *spa_export_initiator; /* thread exporting the pool */
248+
boolean_t spa_pre_exporting; /* allow fails before export */
248249
txg_list_t spa_vdev_txg_list; /* per-txg dirty vdev list */
249250
vdev_t *spa_root_vdev; /* top-level vdev container */
250251
uint64_t spa_min_ashift; /* of vdevs in normal class */

include/sys/zfs_context.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,7 @@ typedef pthread_t kthread_t;
233233
zk_thread_create(func, arg, stksize, state)
234234
#define thread_create(stk, stksize, func, arg, len, pp, state, pri) \
235235
zk_thread_create(func, arg, stksize, state)
236-
#define thread_signal(t, s) pthread_kill((pthread_t)t, s)
236+
#define thread_signal(t, s) pthread_kill((pthread_t)(t), s)
237237
#define thread_exit() pthread_exit(NULL)
238238
#define thread_join(t) pthread_join((pthread_t)(t), NULL)
239239

include/sys/zfs_ioctl_impl.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,10 @@ typedef struct zfs_ioc_key {
7474

7575
int zfs_secpolicy_config(zfs_cmd_t *, nvlist_t *, cred_t *);
7676

77+
void zfs_ioctl_register_pool(zfs_ioc_t ioc, zfs_ioc_legacy_func_t *func,
78+
zfs_secpolicy_func_t *secpolicy, boolean_t log_history,
79+
zfs_ioc_poolcheck_t pool_check);
80+
7781
void zfs_ioctl_register_dataset_nolog(zfs_ioc_t, zfs_ioc_legacy_func_t *,
7882
zfs_secpolicy_func_t *, zfs_ioc_poolcheck_t);
7983

include/sys/zfs_znode.h

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,43 @@ typedef struct znode {
215215
ZNODE_OS_FIELDS;
216216
} znode_t;
217217

218+
/* Verifies the znode is valid. */
219+
static inline int
220+
zfs_verify_zp(znode_t *zp)
221+
{
222+
if (unlikely(zp->z_sa_hdl == NULL))
223+
return (SET_ERROR(EIO));
224+
return (0);
225+
}
226+
227+
/* zfs_enter and zfs_verify_zp together */
228+
static inline int
229+
zfs_enter_verify_zp(zfsvfs_t *zfsvfs, znode_t *zp, const char *tag)
230+
{
231+
int error;
232+
233+
ZFS_ENTER(zfsvfs);
234+
if ((error = zfs_verify_zp(zp)) != 0) {
235+
ZFS_EXIT(zfsvfs);
236+
return (error);
237+
}
238+
return (0);
239+
}
240+
241+
/* zfs_enter_unmountok and zfs_verify_zp together */
242+
static inline int
243+
zfs_enter_unmountok_verify_zp(zfsvfs_t *zfsvfs, znode_t *zp, const char *tag)
244+
{
245+
int error;
246+
247+
ZFS_ENTER_UNMOUNTOK(zfsvfs);
248+
if ((error = zfs_verify_zp(zp)) != 0) {
249+
ZFS_EXIT(zfsvfs);
250+
return (error);
251+
}
252+
return (0);
253+
}
254+
218255
typedef struct znode_hold {
219256
uint64_t zh_obj; /* object id */
220257
avl_node_t zh_node; /* avl tree linkage */

lib/libzfs/libzfs_dataset.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -488,7 +488,6 @@ make_dataset_handle(libzfs_handle_t *hdl, const char *path)
488488

489489
zhp->zfs_hdl = hdl;
490490
(void) strlcpy(zhp->zfs_name, path, sizeof (zhp->zfs_name));
491-
492491
if (!hdl->libzfs_force_export) {
493492
zfs_cmd_t zc = {"\0"};
494493

lib/libzfs/libzfs_mount.c

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1525,7 +1525,8 @@ mountpoint_compare(const void *a, const void *b)
15251525
* and gather all the filesystems that are currently mounted.
15261526
*/
15271527
int
1528-
zpool_disable_datasets(zpool_handle_t *zhp, boolean_t force)
1528+
zpool_disable_datasets(zpool_handle_t *zhp, boolean_t force,
1529+
boolean_t hardforce)
15291530
{
15301531
int used, alloc;
15311532
struct mnttab entry;
@@ -1535,9 +1536,9 @@ zpool_disable_datasets(zpool_handle_t *zhp, boolean_t force)
15351536
libzfs_handle_t *hdl = zhp->zpool_hdl;
15361537
int i;
15371538
int ret = -1;
1538-
int flags = (force ? MS_FORCE : 0);
1539+
int flags = ((hardforce || force) ? MS_FORCE : 0);
15391540

1540-
hdl->libzfs_force_export = force;
1541+
hdl->libzfs_force_export = flags & MS_FORCE;
15411542
namelen = strlen(zhp->zpool_name);
15421543

15431544
/* Reopen MNTTAB to prevent reading stale data from open file */
@@ -1616,6 +1617,10 @@ zpool_disable_datasets(zpool_handle_t *zhp, boolean_t force)
16161617
*/
16171618
qsort(mountpoints, used, sizeof (char *), mountpoint_compare);
16181619

1620+
if (hardforce) {
1621+
zpool_unmount_mark_hard_force_begin(zhp);
1622+
}
1623+
16191624
/*
16201625
* Walk through and first unshare everything.
16211626
*/
@@ -1660,6 +1665,9 @@ zpool_disable_datasets(zpool_handle_t *zhp, boolean_t force)
16601665
}
16611666
free(datasets);
16621667
free(mountpoints);
1668+
if (ret != 0 && hardforce) {
1669+
zpool_unmount_mark_hard_force_end(zhp);
1670+
}
16631671

16641672
return (ret);
16651673
}

lib/libzfs/os/freebsd/libzfs_zmount.c

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,3 +133,29 @@ zfs_mount_delegation_check(void)
133133
{
134134
return (0);
135135
}
136+
137+
/* Called from the tail end of zpool_disable_datasets() */
138+
void
139+
zpool_disable_datasets_os(zpool_handle_t *zhp, boolean_t force)
140+
{
141+
(void) zhp, (void) force;
142+
}
143+
144+
/* Called from the tail end of zfs_unmount() */
145+
void
146+
zpool_disable_volume_os(const char *name)
147+
{
148+
(void) name;
149+
}
150+
151+
void
152+
zpool_unmount_mark_hard_force_begin(zpool_handle_t *zhp)
153+
{
154+
(void) zhp;
155+
}
156+
157+
void
158+
zpool_unmount_mark_hard_force_end(zpool_handle_t *zhp)
159+
{
160+
(void) zhp;
161+
}

lib/libzfs/os/linux/libzfs_mount_os.c

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -411,3 +411,37 @@ zfs_mount_delegation_check(void)
411411
{
412412
return ((geteuid() != 0) ? EACCES : 0);
413413
}
414+
415+
/* Called from the tail end of zpool_disable_datasets() */
416+
void
417+
zpool_disable_datasets_os(zpool_handle_t *zhp, boolean_t force)
418+
{
419+
(void) zhp, (void) force;
420+
}
421+
422+
/* Called from the tail end of zfs_unmount() */
423+
void
424+
zpool_disable_volume_os(const char *name)
425+
{
426+
(void) name;
427+
}
428+
429+
void
430+
zpool_unmount_mark_hard_force_begin(zpool_handle_t *zhp)
431+
{
432+
zfs_cmd_t zc = {"\0"};
433+
libzfs_handle_t *hdl = zhp->zpool_hdl;
434+
435+
(void) strlcpy(zc.zc_name, zhp->zpool_name, sizeof (zc.zc_name));
436+
(void) zfs_ioctl(hdl, ZFS_IOC_HARD_FORCE_UNMOUNT_BEGIN, &zc);
437+
}
438+
439+
void
440+
zpool_unmount_mark_hard_force_end(zpool_handle_t *zhp)
441+
{
442+
zfs_cmd_t zc = {"\0"};
443+
libzfs_handle_t *hdl = zhp->zpool_hdl;
444+
445+
(void) strlcpy(zc.zc_name, zhp->zpool_name, sizeof (zc.zc_name));
446+
(void) zfs_ioctl(hdl, ZFS_IOC_HARD_FORCE_UNMOUNT_END, &zc);
447+
}

man/man4/zfs.4

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -966,6 +966,18 @@ receive of encrypted datasets.
966966
Intended for users whose pools were created with
967967
OpenZFS pre-release versions and now have compatibility issues.
968968
.
969+
.It Sy zfs_forced_export_unmount_enabled Ns = Ns Sy 0 Ns | Ns 1 Pq int
970+
During forced unmount, leave the filesystem in a disabled mode of operation,
971+
in which all new I/Os fail, except for those required to unmount it.
972+
Intended for users trying to forcibly export a pool even when I/Os are in
973+
progress, without the need to find and stop them.
974+
This option does not affect processes that are merely sitting on the
975+
filesystem, only those performing active I/O.
976+
.Pp
977+
This parameter can be set to 1 to enable this behavior.
978+
.Pp
979+
This parameter only applies on Linux.
980+
.
969981
.It Sy zfs_key_max_salt_uses Ns = Ns Sy 400000000 Po 4*10^8 Pc Pq ulong
970982
Maximum number of uses of a single salt value before generating a new one for
971983
encrypted datasets.

0 commit comments

Comments
 (0)