Skip to content

Commit 3d5d319

Browse files
bertoggkevmw
authored andcommitted
blockjob: Make block_job_pause_all() keep a reference to the jobs
Starting from commit 40840e4 we are pausing all block jobs during bdrv_reopen_multiple() to prevent any of them from finishing and removing nodes from the graph while they are being reopened. It turns out that pausing a block job doesn't necessarily prevent it from finishing: a paused block job can still run its exit function from the main loop and call block_job_completed(). The mirror block job in particular always goes to the main loop while it is paused (by virtue of the bdrv_drained_begin() call in mirror_run()). Destroying a paused block job during bdrv_reopen_multiple() has two consequences: 1) The references to the nodes involved in the job are released, possibly destroying some of them. If those nodes were in the reopen queue this would trigger the problem originally described in commit 40840e4, crashing QEMU. 2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would not be doing all necessary bdrv_parent_drained_end() calls. I can reproduce problem 1) easily with iotest 030 by increasing STREAM_BUFFER_SIZE from 512KB to 8MB in block/stream.c, or by tweaking the iotest like in this example: https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html This patch keeps an additional reference to all block jobs between block_job_pause_all() and block_job_resume_all(), guaranteeing that they are kept alive. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
1 parent 495566e commit 3d5d319

File tree

1 file changed

+5
-2
lines changed

1 file changed

+5
-2
lines changed

blockjob.c

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -730,6 +730,7 @@ void block_job_pause_all(void)
730730
AioContext *aio_context = blk_get_aio_context(job->blk);
731731

732732
aio_context_acquire(aio_context);
733+
block_job_ref(job);
733734
block_job_pause(job);
734735
aio_context_release(aio_context);
735736
}
@@ -808,12 +809,14 @@ void coroutine_fn block_job_pause_point(BlockJob *job)
808809

809810
void block_job_resume_all(void)
810811
{
811-
BlockJob *job = NULL;
812-
while ((job = block_job_next(job))) {
812+
BlockJob *job, *next;
813+
814+
QLIST_FOREACH_SAFE(job, &block_jobs, job_list, next) {
813815
AioContext *aio_context = blk_get_aio_context(job->blk);
814816

815817
aio_context_acquire(aio_context);
816818
block_job_resume(job);
819+
block_job_unref(job);
817820
aio_context_release(aio_context);
818821
}
819822
}

0 commit comments

Comments
 (0)