Skip to content

worker thread get stuck in posix_fadvise #1165

Closed
@luphye

Description

@luphye

On Linux, posix_fadvise can take a significant amount of time (10 ms or more) when the system experiences disturbances, mainly from two sources:

  1. fadvise ->...-> lru_add_drain_all(): The lru_add_drain_all() function, called by fadvise, contains a global lock. When there are many fadvise (or lru_add_drain_all ) calls on the system, acquiring this lock can take a long time.

  2. fadvise ->...-> vfs_fadvise()->xfs_vm_writepages()->...->blk_mq_submit_bio(),blk_mq_submit_bio() can go into io_schedule when lack of io queues. This means when there is a big IO on disk(mainly HDD), fadvise may sleep, thus add a latency to the workload on the thread that calls posix_fadvise.

Can we repair this by calling fadvise asynchronously in a dedicated thread rather than in the worker thread that handles logging? so that the latency of posix_fadvise will not affect the real workload any more.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions