Skip to content

Commit

Permalink
mm, memory_hotplug: remove timeout from __offline_memory
Browse files Browse the repository at this point in the history
We have a hardcoded 120s timeout after which the memory offline fails
basically since the hot remove has been introduced.  This is essentially
a policy implemented in the kernel.  Moreover there is no way to adjust
the timeout and so we are sometimes facing memory offline failures if
the system is under a heavy memory pressure or very intensive CPU
workload on large machines.

It is not very clear what purpose the timeout actually serves.  The
offline operation is interruptible by a signal so if userspace wants
some timeout based termination this can be done trivially by sending a
signal.

If there is a strong usecase to do this from the kernel then we should
do it properly and have a it tunable from the userspace with the timeout
disabled by default along with the explanation who uses it and for what
purporse.

Link: http://lkml.kernel.org/r/20170918070834.13083-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
  • Loading branch information
Michal Hocko authored and torvalds committed Nov 16, 2017
1 parent 72b39cf commit ecde0f3
Showing 1 changed file with 3 additions and 7 deletions.
10 changes: 3 additions & 7 deletions mm/memory_hotplug.c
Original file line number Diff line number Diff line change
Expand Up @@ -1590,9 +1590,9 @@ static void node_states_clear_node(int node, struct memory_notify *arg)
}

static int __ref __offline_pages(unsigned long start_pfn,
unsigned long end_pfn, unsigned long timeout)
unsigned long end_pfn)
{
unsigned long pfn, nr_pages, expire;
unsigned long pfn, nr_pages;
long offlined_pages;
int ret, node;
unsigned long flags;
Expand Down Expand Up @@ -1630,12 +1630,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
goto failed_removal;

pfn = start_pfn;
expire = jiffies + timeout;
repeat:
/* start memory hot removal */
ret = -EBUSY;
if (time_after(jiffies, expire))
goto failed_removal;
ret = -EINTR;
if (signal_pending(current))
goto failed_removal;
Expand Down Expand Up @@ -1708,7 +1704,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
/* Must be protected by mem_hotplug_begin() or a device_lock */
int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
{
return __offline_pages(start_pfn, start_pfn + nr_pages, 120 * HZ);
return __offline_pages(start_pfn, start_pfn + nr_pages);
}
#endif /* CONFIG_MEMORY_HOTREMOVE */

Expand Down

0 comments on commit ecde0f3

Please sign in to comment.