Skip to content

Commit

Permalink
Merge tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux
Browse files Browse the repository at this point in the history
Pull block updates from Jens Axboe:

 - NVMe updates via Christoph:
      - Small improvements to the logging functionality (Amit Engel)
      - Authentication cleanups (Hannes Reinecke)
      - Cleanup and optimize the DMA mapping cod in the PCIe driver
        (Keith Busch)
      - Work around the command effects for Format NVM (Keith Busch)
      - Misc cleanups (Keith Busch, Christoph Hellwig)
      - Fix and cleanup freeing single sgl (Keith Busch)

 - MD updates via Song:
      - Fix a rare crash during the takeover process
      - Don't update recovery_cp when curr_resync is ACTIVE
      - Free writes_pending in md_stop
      - Change active_io to percpu

 - Updates to drbd, inching us closer to unifying the out-of-tree driver
   with the in-tree one (Andreas, Christoph, Lars, Robert)

 - BFQ update adding support for multi-actuator drives (Paolo, Federico,
   Davide)

 - Make brd compliant with REQ_NOWAIT (me)

 - Fix for IOPOLL and queue entering, fixing stalled IO waiting on
   timeouts (me)

 - Fix for REQ_NOWAIT with multiple bios (me)

 - Fix memory leak in blktrace cleanup (Greg)

 - Clean up sbitmap and fix a potential hang (Kemeng)

 - Clean up some bits in BFQ, and fix a bug in the request injection
   (Kemeng)

 - Clean up the request allocation and issue code, and fix some bugs
   related to that (Kemeng)

 - ublk updates and fixes:
      - Add support for unprivileged ublk (Ming)
      - Improve device deletion handling (Ming)
      - Misc (Liu, Ziyang)

 - s390 dasd fixes (Alexander, Qiheng)

 - Improve utility of request caching and fixes (Anuj, Xiao)

 - zoned cleanups (Pankaj)

 - More constification for kobjs (Thomas)

 - blk-iocost cleanups (Yu)

 - Remove bio splitting from drivers that don't need it (Christoph)

 - Switch blk-cgroups to use struct gendisk. Some of this is now
   incomplete as select late reverts were done. (Christoph)

 - Add bvec initialization helpers, and convert callers to use that
   rather than open-coding it (Christoph)

 - Misc fixes and cleanups (Jinke, Keith, Arnd, Bart, Li, Martin,
   Matthew, Ulf, Zhong)

* tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits)
  brd: use radix_tree_maybe_preload instead of radix_tree_preload
  block: use proper return value from bio_failfast()
  block: bio-integrity: Copy flags when bio_integrity_payload is cloned
  block: Fix io statistics for cgroup in throttle path
  brd: mark as nowait compatible
  brd: check for REQ_NOWAIT and set correct page allocation mask
  brd: return 0/-error from brd_insert_page()
  block: sync mixed merged request's failfast with 1st bio's
  Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
  Revert "blk-cgroup: pass a gendisk to blkg_lookup"
  Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"
  Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"
  Revert "blk-cgroup: move the cgroup information to struct gendisk"
  nvme-pci: remove iod use_sgls
  nvme-pci: fix freeing single sgl
  block: ublk: check IO buffer based on flag need_get_data
  s390/dasd: Fix potential memleak in dasd_eckd_init()
  s390/dasd: sort out physical vs virtual pointers usage
  block: Remove the ALLOC_CACHE_SLACK constant
  block: make kobj_type structures constant
  ...
  • Loading branch information
torvalds committed Feb 20, 2023
2 parents 553637f + 0aa2988 commit 5b0ed59
Show file tree
Hide file tree
Showing 109 changed files with 2,224 additions and 1,599 deletions.
3 changes: 2 additions & 1 deletion Documentation/ABI/stable/sysfs-block
Original file line number Diff line number Diff line change
Expand Up @@ -432,7 +432,8 @@ Contact: linux-block@vger.kernel.org
Description:
[RW] This is the maximum number of kilobytes that the block
layer will allow for a filesystem request. Must be smaller than
or equal to the maximum size allowed by the hardware.
or equal to the maximum size allowed by the hardware. Write 0
to use default kernel settings.


What: /sys/block/<disk>/queue/max_segment_size
Expand Down
10 changes: 0 additions & 10 deletions Documentation/block/capability.rst

This file was deleted.

1 change: 0 additions & 1 deletion Documentation/block/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ Block
bfq-iosched
biovecs
blk-mq
capability
cmdline-partition
data-integrity
deadline-iosched
Expand Down
55 changes: 46 additions & 9 deletions Documentation/block/ublk.rst
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,43 @@ managing and controlling ublk devices with help of several control commands:
For retrieving device info via ``ublksrv_ctrl_dev_info``. It is the server's
responsibility to save IO target specific info in userspace.

- ``UBLK_CMD_GET_DEV_INFO2``
Same purpose with ``UBLK_CMD_GET_DEV_INFO``, but ublk server has to
provide path of the char device of ``/dev/ublkc*`` for kernel to run
permission check, and this command is added for supporting unprivileged
ublk device, and introduced with ``UBLK_F_UNPRIVILEGED_DEV`` together.
Only the user owning the requested device can retrieve the device info.

How to deal with userspace/kernel compatibility:

1) if kernel is capable of handling ``UBLK_F_UNPRIVILEGED_DEV``

If ublk server supports ``UBLK_F_UNPRIVILEGED_DEV``:

ublk server should send ``UBLK_CMD_GET_DEV_INFO2``, given anytime
unprivileged application needs to query devices the current user owns,
when the application has no idea if ``UBLK_F_UNPRIVILEGED_DEV`` is set
given the capability info is stateless, and application should always
retrieve it via ``UBLK_CMD_GET_DEV_INFO2``

If ublk server doesn't support ``UBLK_F_UNPRIVILEGED_DEV``:

``UBLK_CMD_GET_DEV_INFO`` is always sent to kernel, and the feature of
UBLK_F_UNPRIVILEGED_DEV isn't available for user

2) if kernel isn't capable of handling ``UBLK_F_UNPRIVILEGED_DEV``

If ublk server supports ``UBLK_F_UNPRIVILEGED_DEV``:

``UBLK_CMD_GET_DEV_INFO2`` is tried first, and will be failed, then
``UBLK_CMD_GET_DEV_INFO`` needs to be retried given
``UBLK_F_UNPRIVILEGED_DEV`` can't be set

If ublk server doesn't support ``UBLK_F_UNPRIVILEGED_DEV``:

``UBLK_CMD_GET_DEV_INFO`` is always sent to kernel, and the feature of
``UBLK_F_UNPRIVILEGED_DEV`` isn't available for user

- ``UBLK_CMD_START_USER_RECOVERY``

This command is valid if ``UBLK_F_USER_RECOVERY`` feature is enabled. This
Expand Down Expand Up @@ -180,6 +217,15 @@ managing and controlling ublk devices with help of several control commands:
double-write since the driver may issue the same I/O request twice. It
might be useful to a read-only FS or a VM backend.

Unprivileged ublk device is supported by passing ``UBLK_F_UNPRIVILEGED_DEV``.
Once the flag is set, all control commands can be sent by unprivileged
user. Except for command of ``UBLK_CMD_ADD_DEV``, permission check on
the specified char device(``/dev/ublkc*``) is done for all other control
commands by ublk driver, for doing that, path of the char device has to
be provided in these commands' payload from ublk server. With this way,
ublk device becomes container-ware, and device created in one container
can be controlled/accessed just inside this container.

Data plane
----------

Expand Down Expand Up @@ -254,15 +300,6 @@ with specified IO tag in the command data:
Future development
==================

Container-aware ublk deivice
----------------------------

ublk driver doesn't handle any IO logic. Its function is well defined
for now and very limited userspace interfaces are needed, which is also
well defined too. It is possible to make ublk devices container-aware block
devices in future as Stefan Hajnoczi suggested [#stefan]_, by removing
ADMIN privilege.

Zero copy
---------

Expand Down
1 change: 1 addition & 0 deletions MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -6425,6 +6425,7 @@ T: git git://git.linbit.com/linux-drbd.git
T: git git://git.linbit.com/drbd-8.4.git
F: Documentation/admin-guide/blockdev/
F: drivers/block/drbd/
F: include/linux/drbd*
F: lib/lru_cache.c

DRIVER COMPONENT FRAMEWORK
Expand Down
1 change: 1 addition & 0 deletions block/Kconfig.iosched
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ config IOSCHED_BFQ
config BFQ_GROUP_IOSCHED
bool "BFQ hierarchical scheduling support"
depends on IOSCHED_BFQ && BLK_CGROUP
default y
select BLK_CGROUP_RWSTAT
help

Expand Down
105 changes: 55 additions & 50 deletions block/bfq-cgroup.c
Original file line number Diff line number Diff line change
Expand Up @@ -513,12 +513,12 @@ static void bfq_cpd_free(struct blkcg_policy_data *cpd)
kfree(cpd_to_bfqgd(cpd));
}

static struct blkg_policy_data *bfq_pd_alloc(gfp_t gfp, struct request_queue *q,
struct blkcg *blkcg)
static struct blkg_policy_data *bfq_pd_alloc(struct gendisk *disk,
struct blkcg *blkcg, gfp_t gfp)
{
struct bfq_group *bfqg;

bfqg = kzalloc_node(sizeof(*bfqg), gfp, q->node);
bfqg = kzalloc_node(sizeof(*bfqg), gfp, disk->node_id);
if (!bfqg)
return NULL;

Expand Down Expand Up @@ -551,7 +551,6 @@ static void bfq_pd_init(struct blkg_policy_data *pd)
bfqg->bfqd = bfqd;
bfqg->active_entities = 0;
bfqg->num_queues_with_pending_reqs = 0;
bfqg->online = true;
bfqg->rq_pos_tree = RB_ROOT;
}

Expand Down Expand Up @@ -614,7 +613,7 @@ struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio)
continue;
}
bfqg = blkg_to_bfqg(blkg);
if (bfqg->online) {
if (bfqg->pd.online) {
bio_associate_blkg_from_css(bio, &blkg->blkcg->css);
return bfqg;
}
Expand Down Expand Up @@ -706,12 +705,52 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
bfq_activate_bfqq(bfqd, bfqq);
}

if (!bfqd->in_service_queue && !bfqd->rq_in_driver)
if (!bfqd->in_service_queue && !bfqd->tot_rq_in_driver)
bfq_schedule_dispatch(bfqd);
/* release extra ref taken above, bfqq may happen to be freed now */
bfq_put_queue(bfqq);
}

static void bfq_sync_bfqq_move(struct bfq_data *bfqd,
struct bfq_queue *sync_bfqq,
struct bfq_io_cq *bic,
struct bfq_group *bfqg,
unsigned int act_idx)
{
struct bfq_queue *bfqq;

if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) {
/* We are the only user of this bfqq, just move it */
if (sync_bfqq->entity.sched_data != &bfqg->sched_data)
bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
return;
}

/*
* The queue was merged to a different queue. Check
* that the merge chain still belongs to the same
* cgroup.
*/
for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq)
if (bfqq->entity.sched_data != &bfqg->sched_data)
break;
if (bfqq) {
/*
* Some queue changed cgroup so the merge is not valid
* anymore. We cannot easily just cancel the merge (by
* clearing new_bfqq) as there may be other processes
* using this queue and holding refs to all queues
* below sync_bfqq->new_bfqq. Similarly if the merge
* already happened, we need to detach from bfqq now
* so that we cannot merge bio to a request from the
* old cgroup.
*/
bfq_put_cooperator(sync_bfqq);
bic_set_bfqq(bic, NULL, true, act_idx);
bfq_release_process_ref(bfqd, sync_bfqq);
}
}

/**
* __bfq_bic_change_cgroup - move @bic to @bfqg.
* @bfqd: the queue descriptor.
Expand All @@ -726,53 +765,20 @@ static void __bfq_bic_change_cgroup(struct bfq_data *bfqd,
struct bfq_io_cq *bic,
struct bfq_group *bfqg)
{
struct bfq_queue *async_bfqq = bic_to_bfqq(bic, false);
struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, true);
struct bfq_entity *entity;
unsigned int act_idx;

if (async_bfqq) {
entity = &async_bfqq->entity;
for (act_idx = 0; act_idx < bfqd->num_actuators; act_idx++) {
struct bfq_queue *async_bfqq = bic_to_bfqq(bic, false, act_idx);
struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, true, act_idx);

if (entity->sched_data != &bfqg->sched_data) {
bic_set_bfqq(bic, NULL, false);
if (async_bfqq &&
async_bfqq->entity.sched_data != &bfqg->sched_data) {
bic_set_bfqq(bic, NULL, false, act_idx);
bfq_release_process_ref(bfqd, async_bfqq);
}
}

if (sync_bfqq) {
if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) {
/* We are the only user of this bfqq, just move it */
if (sync_bfqq->entity.sched_data != &bfqg->sched_data)
bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
} else {
struct bfq_queue *bfqq;

/*
* The queue was merged to a different queue. Check
* that the merge chain still belongs to the same
* cgroup.
*/
for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq)
if (bfqq->entity.sched_data !=
&bfqg->sched_data)
break;
if (bfqq) {
/*
* Some queue changed cgroup so the merge is
* not valid anymore. We cannot easily just
* cancel the merge (by clearing new_bfqq) as
* there may be other processes using this
* queue and holding refs to all queues below
* sync_bfqq->new_bfqq. Similarly if the merge
* already happened, we need to detach from
* bfqq now so that we cannot merge bio to a
* request from the old cgroup.
*/
bfq_put_cooperator(sync_bfqq);
bic_set_bfqq(bic, NULL, true);
bfq_release_process_ref(bfqd, sync_bfqq);
}
}
if (sync_bfqq)
bfq_sync_bfqq_move(bfqd, sync_bfqq, bic, bfqg, act_idx);
}
}

Expand Down Expand Up @@ -978,7 +984,6 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)

put_async_queues:
bfq_put_async_queues(bfqd, bfqg);
bfqg->online = false;

spin_unlock_irqrestore(&bfqd->lock, flags);
/*
Expand Down Expand Up @@ -1284,7 +1289,7 @@ struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)
{
int ret;

ret = blkcg_activate_policy(bfqd->queue, &blkcg_policy_bfq);
ret = blkcg_activate_policy(bfqd->queue->disk, &blkcg_policy_bfq);
if (ret)
return NULL;

Expand Down
Loading

0 comments on commit 5b0ed59

Please sign in to comment.