Skip to content

Commit abaa59a

Browse files
wcaoshogbo
authored andcommitted
zfs: support force exporting pools
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test #3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461
1 parent 620a977 commit abaa59a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+4247
-2617
lines changed

cmd/zpool/zpool_main.c

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -363,7 +363,7 @@ get_usage(zpool_help_t idx)
363363
case HELP_DETACH:
364364
return (gettext("\tdetach <pool> <device>\n"));
365365
case HELP_EXPORT:
366-
return (gettext("\texport [-af] <pool> ...\n"));
366+
return (gettext("\texport [-afF] <pool> ...\n"));
367367
case HELP_HISTORY:
368368
return (gettext("\thistory [-il] [<pool>] ...\n"));
369369
case HELP_IMPORT:
@@ -1901,7 +1901,7 @@ zpool_export_one(zpool_handle_t *zhp, void *data)
19011901
{
19021902
export_cbdata_t *cb = data;
19031903

1904-
if (zpool_disable_datasets(zhp, cb->force) != 0)
1904+
if (zpool_disable_datasets(zhp, cb->force || cb->hardforce) != 0)
19051905
return (1);
19061906

19071907
/* The history must be logged as part of the export */
@@ -1922,10 +1922,13 @@ zpool_export_one(zpool_handle_t *zhp, void *data)
19221922
*
19231923
* -a Export all pools
19241924
* -f Forcefully unmount datasets
1925+
* -F Forcefully export, dropping all outstanding dirty data
19251926
*
19261927
* Export the given pools. By default, the command will attempt to cleanly
19271928
* unmount any active datasets within the pool. If the '-f' flag is specified,
1928-
* then the datasets will be forcefully unmounted.
1929+
* then the datasets will be forcefully unmounted. If the '-F' flag is
1930+
* specified, the pool's dirty data, if any, will simply be dropped after a
1931+
* best-effort attempt to forcibly stop all activity.
19291932
*/
19301933
int
19311934
zpool_do_export(int argc, char **argv)

include/libzfs.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -419,6 +419,7 @@ typedef enum {
419419
ZPOOL_STATUS_NON_NATIVE_ASHIFT, /* (e.g. 512e dev with ashift of 9) */
420420
ZPOOL_STATUS_COMPATIBILITY_ERR, /* bad 'compatibility' property */
421421
ZPOOL_STATUS_INCOMPATIBLE_FEAT, /* feature set outside compatibility */
422+
ZPOOL_STATUS_FORCE_EXPORTING, /* pool is being force exported */
422423

423424
/*
424425
* Finally, the following indicates a healthy pool.

include/os/freebsd/spl/sys/thread.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,4 +31,7 @@
3131

3232
#define getcomm() curthread->td_name
3333
#define getpid() curthread->td_tid
34+
#define thread_signal spl_kthread_signal
35+
extern int spl_kthread_signal(kthread_t *tsk, int sig);
36+
3437
#endif

include/os/freebsd/zfs/sys/zfs_znode_impl.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,8 @@ zfs_enter(zfsvfs_t *zfsvfs, const char *tag)
134134
return (0);
135135
}
136136

137+
#define zfs_enter_unmountok zfs_enter
138+
137139
/* Must be called before exiting the vop */
138140
static inline void
139141
zfs_exit(zfsvfs_t *zfsvfs, const char *tag)

include/os/linux/spl/sys/thread.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ typedef void (*thread_func_t)(void *);
5353
__thread_create(stk, stksize, (thread_func_t)func, #func, \
5454
arg, len, pp, state, pri)
5555

56+
#define thread_signal(t, s) spl_kthread_signal(t, s)
5657
#define thread_exit() spl_thread_exit()
5758
#define thread_join(t) VERIFY(0)
5859
#define curthread current
@@ -64,6 +65,7 @@ extern kthread_t *__thread_create(caddr_t stk, size_t stksize,
6465
int state, pri_t pri);
6566
extern struct task_struct *spl_kthread_create(int (*func)(void *),
6667
void *data, const char namefmt[], ...);
68+
extern int spl_kthread_signal(kthread_t *tsk, int sig);
6769

6870
static inline __attribute__((noreturn)) void
6971
spl_thread_exit(void)

include/os/linux/zfs/sys/zfs_vfsops_os.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,8 @@ struct zfsvfs {
101101
boolean_t z_utf8; /* utf8-only */
102102
int z_norm; /* normalization flags */
103103
boolean_t z_relatime; /* enable relatime mount option */
104-
boolean_t z_unmounted; /* unmounted */
104+
boolean_t z_unmounted; /* mount status */
105+
boolean_t z_force_unmounted; /* force-unmounted status */
105106
rrmlock_t z_teardown_lock;
106107
krwlock_t z_teardown_inactive_lock;
107108
list_t z_all_znodes; /* all znodes in the fs */

include/os/linux/zfs/sys/zfs_znode_impl.h

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -97,24 +97,39 @@ extern "C" {
9797
#define zhold(zp) VERIFY3P(igrab(ZTOI((zp))), !=, NULL)
9898
#define zrele(zp) iput(ZTOI((zp)))
9999

100+
#define zfsvfs_is_unmounted(zfsvfs) \
101+
((zfsvfs)->z_unmounted || (zfsvfs)->z_force_unmounted)
102+
103+
/* Must be called before exiting the operation. */
104+
static inline void
105+
zfs_exit(zfsvfs_t *zfsvfs, const char *tag)
106+
{
107+
zfs_exit_fs(zfsvfs);
108+
ZFS_TEARDOWN_EXIT_READ(zfsvfs, tag);
109+
}
110+
100111
/* Called on entry to each ZFS inode and vfs operation. */
101112
static inline int
102113
zfs_enter(zfsvfs_t *zfsvfs, const char *tag)
103114
{
104115
ZFS_TEARDOWN_ENTER_READ(zfsvfs, tag);
105-
if (unlikely(zfsvfs->z_unmounted)) {
116+
if (unlikely(zfsvfs_is_unmounted(zfsvfs))) {
106117
ZFS_TEARDOWN_EXIT_READ(zfsvfs, tag);
107118
return (SET_ERROR(EIO));
108119
}
109120
return (0);
110121
}
111122

112-
/* Must be called before exiting the operation. */
113-
static inline void
114-
zfs_exit(zfsvfs_t *zfsvfs, const char *tag)
123+
/* ZFS_ENTER but ok with forced unmount having begun */
124+
static inline int
125+
zfs_enter_unmountok(zfsvfs_t *zfsvfs, const char *tag)
115126
{
116-
zfs_exit_fs(zfsvfs);
117-
ZFS_TEARDOWN_EXIT_READ(zfsvfs, tag);
127+
ZFS_TEARDOWN_ENTER_READ(zfsvfs, tag);
128+
if (unlikely((zfsvfs)->z_unmounted == B_TRUE)) {
129+
zfs_exit(zfsvfs, tag);
130+
return (SET_ERROR(EIO));
131+
}
132+
return (0);
118133
}
119134

120135
static inline int

include/sys/arc.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -339,6 +339,7 @@ void l2arc_fini(void);
339339
void l2arc_start(void);
340340
void l2arc_stop(void);
341341
void l2arc_spa_rebuild_start(spa_t *spa);
342+
void l2arc_spa_rebuild_stop(spa_t *spa);
342343

343344
#ifndef _KERNEL
344345
extern boolean_t arc_watch;

include/sys/dmu.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,7 @@ typedef enum dmu_object_type {
283283
#define TXG_NOWAIT (0ULL)
284284
#define TXG_WAIT (1ULL<<0)
285285
#define TXG_NOTHROTTLE (1ULL<<1)
286+
#define TXG_NOSUSPEND (1ULL<<2)
286287

287288
void byteswap_uint64_array(void *buf, size_t size);
288289
void byteswap_uint32_array(void *buf, size_t size);

include/sys/dmu_impl.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,7 @@ typedef struct dmu_sendstatus {
241241
list_node_t dss_link;
242242
int dss_outfd;
243243
proc_t *dss_proc;
244+
kthread_t *dss_thread;
244245
offset_t *dss_off;
245246
uint64_t dss_blocks; /* blocks visited during the sending process */
246247
} dmu_sendstatus_t;

include/sys/dmu_objset.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,7 @@ struct objset {
172172

173173
/* Protected by os_lock */
174174
kmutex_t os_lock;
175+
kthread_t *os_shutdown_initiator;
175176
multilist_t os_dirty_dnodes[TXG_SIZE];
176177
list_t os_dnodes;
177178
list_t os_downgraded_dbufs;
@@ -259,6 +260,10 @@ int dmu_fsname(const char *snapname, char *buf);
259260
void dmu_objset_evict_done(objset_t *os);
260261
void dmu_objset_willuse_space(objset_t *os, int64_t space, dmu_tx_t *tx);
261262

263+
int dmu_objset_shutdown_register(objset_t *os);
264+
boolean_t dmu_objset_exiting(objset_t *os);
265+
void dmu_objset_shutdown_unregister(objset_t *os);
266+
262267
void dmu_objset_init(void);
263268
void dmu_objset_fini(void);
264269

include/sys/dmu_recv.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ extern const char *const recv_clone_name;
4040

4141
typedef struct dmu_recv_cookie {
4242
struct dsl_dataset *drc_ds;
43+
kthread_t *drc_initiator;
4344
struct dmu_replay_record *drc_drr_begin;
4445
struct drr_begin *drc_drrb;
4546
const char *drc_tofs;
@@ -57,6 +58,8 @@ typedef struct dmu_recv_cookie {
5758
nvlist_t *drc_keynvl;
5859
uint64_t drc_fromsnapobj;
5960
uint64_t drc_ivset_guid;
61+
unsigned int drc_flags;
62+
void *drc_rwa;
6063
void *drc_owner;
6164
cred_t *drc_cred;
6265
proc_t *drc_proc;
@@ -83,6 +86,7 @@ int dmu_recv_begin(char *, char *, dmu_replay_record_t *,
8386
boolean_t, boolean_t, boolean_t, nvlist_t *, nvlist_t *, char *,
8487
dmu_recv_cookie_t *, zfs_file_t *, offset_t *);
8588
int dmu_recv_stream(dmu_recv_cookie_t *, offset_t *);
89+
int dmu_recv_close(dsl_dataset_t *ds);
8690
int dmu_recv_end(dmu_recv_cookie_t *, void *);
8791
boolean_t dmu_objset_is_receiving(objset_t *);
8892

include/sys/dmu_send.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ int dmu_send_obj(const char *pool, uint64_t tosnap, uint64_t fromsnap,
6060
boolean_t embedok, boolean_t large_block_ok, boolean_t compressok,
6161
boolean_t rawok, boolean_t savedok, int outfd, offset_t *off,
6262
struct dmu_send_outparams *dso);
63+
int dmu_send_close(struct dsl_dataset *ds);
6364

6465
typedef int (*dmu_send_outfunc_t)(objset_t *os, void *buf, int len, void *arg);
6566
typedef struct dmu_send_outparams {

include/sys/dsl_dataset.h

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,8 @@ typedef struct dsl_dataset {
242242
kmutex_t ds_sendstream_lock;
243243
list_t ds_sendstreams;
244244

245+
struct dmu_recv_cookie *ds_receiver;
246+
245247
/*
246248
* When in the middle of a resumable receive, tracks how much
247249
* progress we have made.
@@ -324,7 +326,8 @@ typedef struct dsl_dataset_rename_snapshot_arg {
324326
/* flags for holding the dataset */
325327
typedef enum ds_hold_flags {
326328
DS_HOLD_FLAG_NONE = 0 << 0,
327-
DS_HOLD_FLAG_DECRYPT = 1 << 0 /* needs access to encrypted data */
329+
DS_HOLD_FLAG_DECRYPT = 1 << 0, /* needs access to encrypted data */
330+
DS_HOLD_FLAG_MUST_BE_OPEN = 1 << 1, /* dataset must already be open */
328331
} ds_hold_flags_t;
329332

330333
int dsl_dataset_hold(struct dsl_pool *dp, const char *name, const void *tag,
@@ -453,6 +456,8 @@ void dsl_dataset_long_hold(dsl_dataset_t *ds, const void *tag);
453456
void dsl_dataset_long_rele(dsl_dataset_t *ds, const void *tag);
454457
boolean_t dsl_dataset_long_held(dsl_dataset_t *ds);
455458

459+
int dsl_dataset_sendrecv_cancel_all(spa_t *spa);
460+
456461
int dsl_dataset_clone_swap_check_impl(dsl_dataset_t *clone,
457462
dsl_dataset_t *origin_head, boolean_t force, void *owner, dmu_tx_t *tx);
458463
void dsl_dataset_clone_swap_sync_impl(dsl_dataset_t *clone,

include/sys/dsl_scan.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,7 @@ int dsl_scan(struct dsl_pool *, pool_scan_func_t);
172172
void dsl_scan_assess_vdev(struct dsl_pool *dp, vdev_t *vd);
173173
boolean_t dsl_scan_scrubbing(const struct dsl_pool *dp);
174174
int dsl_scrub_set_pause_resume(const struct dsl_pool *dp, pool_scrub_cmd_t cmd);
175-
void dsl_scan_restart_resilver(struct dsl_pool *, uint64_t txg);
175+
int dsl_scan_restart_resilver(struct dsl_pool *, uint64_t txg);
176176
boolean_t dsl_scan_resilvering(struct dsl_pool *dp);
177177
boolean_t dsl_scan_resilver_scheduled(struct dsl_pool *dp);
178178
boolean_t dsl_dataset_unstable(struct dsl_dataset *ds);

include/sys/metaslab.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,7 @@ boolean_t metaslab_class_throttle_reserve(metaslab_class_t *, int, int,
114114
zio_t *, int);
115115
void metaslab_class_throttle_unreserve(metaslab_class_t *, int, int, zio_t *);
116116
void metaslab_class_evict_old(metaslab_class_t *, uint64_t);
117+
void metaslab_class_force_discard(metaslab_class_t *);
117118
uint64_t metaslab_class_get_alloc(metaslab_class_t *);
118119
uint64_t metaslab_class_get_space(metaslab_class_t *);
119120
uint64_t metaslab_class_get_dspace(metaslab_class_t *);

include/sys/spa.h

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -835,16 +835,13 @@ extern kmutex_t spa_namespace_lock;
835835
* SPA configuration functions in spa_config.c
836836
*/
837837

838-
#define SPA_CONFIG_UPDATE_POOL 0
839-
#define SPA_CONFIG_UPDATE_VDEVS 1
840-
841838
extern void spa_write_cachefile(spa_t *, boolean_t, boolean_t, boolean_t);
842839
extern void spa_config_load(void);
843840
extern nvlist_t *spa_all_configs(uint64_t *);
844841
extern void spa_config_set(spa_t *spa, nvlist_t *config);
845842
extern nvlist_t *spa_config_generate(spa_t *spa, vdev_t *vd, uint64_t txg,
846843
int getstats);
847-
extern void spa_config_update(spa_t *spa, int what);
844+
extern int spa_config_update_pool(spa_t *spa);
848845
extern int spa_config_parse(spa_t *spa, vdev_t **vdp, nvlist_t *nv,
849846
vdev_t *parent, uint_t id, int atype);
850847

@@ -961,6 +958,13 @@ extern void spa_iostats_trim_add(spa_t *spa, trim_type_t type,
961958
uint64_t extents_written, uint64_t bytes_written,
962959
uint64_t extents_skipped, uint64_t bytes_skipped,
963960
uint64_t extents_failed, uint64_t bytes_failed);
961+
962+
/* Config lock handling flags */
963+
typedef enum {
964+
SCL_FLAG_TRYENTER = 1U << 0,
965+
SCL_FLAG_NOSUSPEND = 1U << 1,
966+
} spa_config_flag_t;
967+
964968
extern void spa_import_progress_add(spa_t *spa);
965969
extern void spa_import_progress_remove(uint64_t spa_guid);
966970
extern int spa_import_progress_set_mmp_check(uint64_t pool_guid,
@@ -973,6 +977,8 @@ extern int spa_import_progress_set_state(uint64_t pool_guid,
973977
/* Pool configuration locks */
974978
extern int spa_config_tryenter(spa_t *spa, int locks, const void *tag,
975979
krw_t rw);
980+
extern int spa_config_enter_flags(spa_t *spa, int locks, const void *tag,
981+
krw_t rw, spa_config_flag_t flags);
976982
extern void spa_config_enter(spa_t *spa, int locks, const void *tag, krw_t rw);
977983
extern void spa_config_exit(spa_t *spa, int locks, const void *tag);
978984
extern int spa_config_held(spa_t *spa, int locks, krw_t rw);
@@ -1021,6 +1027,7 @@ extern uint64_t spa_last_synced_txg(spa_t *spa);
10211027
extern uint64_t spa_first_txg(spa_t *spa);
10221028
extern uint64_t spa_syncing_txg(spa_t *spa);
10231029
extern uint64_t spa_final_dirty_txg(spa_t *spa);
1030+
extern void spa_verify_dirty_txg(spa_t *spa, uint64_t txg);
10241031
extern uint64_t spa_version(spa_t *spa);
10251032
extern pool_state_t spa_state(spa_t *spa);
10261033
extern spa_load_state_t spa_load_state(spa_t *spa);
@@ -1040,6 +1047,8 @@ extern metaslab_class_t *spa_dedup_class(spa_t *spa);
10401047
extern metaslab_class_t *spa_preferred_class(spa_t *spa, uint64_t size,
10411048
dmu_object_type_t objtype, uint_t level, uint_t special_smallblk);
10421049

1050+
extern void spa_evicting_os_lock(spa_t *);
1051+
extern void spa_evicting_os_unlock(spa_t *);
10431052
extern void spa_evicting_os_register(spa_t *, objset_t *os);
10441053
extern void spa_evicting_os_deregister(spa_t *, objset_t *os);
10451054
extern void spa_evicting_os_wait(spa_t *spa);
@@ -1131,6 +1140,10 @@ extern void spa_history_log_internal_dd(dsl_dir_t *dd, const char *operation,
11311140

11321141
extern const char *spa_state_to_name(spa_t *spa);
11331142

1143+
extern boolean_t spa_exiting_any(spa_t *spa);
1144+
extern boolean_t spa_exiting(spa_t *spa);
1145+
extern int spa_operation_interrupted(spa_t *spa);
1146+
11341147
/* error handling */
11351148
struct zbookmark_phys;
11361149
extern void spa_log_error(spa_t *spa, const zbookmark_phys_t *zb);

include/sys/spa_impl.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -244,6 +244,7 @@ struct spa {
244244
kmutex_t spa_evicting_os_lock; /* Evicting objset list lock */
245245
list_t spa_evicting_os_list; /* Objsets being evicted. */
246246
kcondvar_t spa_evicting_os_cv; /* Objset Eviction Completion */
247+
kthread_t *spa_export_initiator; /* thread exporting the pool */
247248
txg_list_t spa_vdev_txg_list; /* per-txg dirty vdev list */
248249
vdev_t *spa_root_vdev; /* top-level vdev container */
249250
uint64_t spa_min_ashift; /* of vdevs in normal class */

0 commit comments

Comments
 (0)