Skip to content

Commit 1f1d06c

Browse files
rientjestorvalds
authored andcommitted
thp, memcg: split hugepage for memcg oom on cow
On COW, a new hugepage is allocated and charged to the memcg. If the system is oom or the charge to the memcg fails, however, the fault handler will return VM_FAULT_OOM which results in an oom kill. Instead, it's possible to fallback to splitting the hugepage so that the COW results only in an order-0 page being allocated and charged to the memcg which has a higher liklihood to succeed. This is expensive because the hugepage must be split in the page fault handler, but it is much better than unnecessarily oom killing a process. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <jweiner@redhat.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent bde8bd8 commit 1f1d06c

File tree

2 files changed

+18
-3
lines changed

2 files changed

+18
-3
lines changed

mm/huge_memory.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -952,13 +952,16 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
952952
count_vm_event(THP_FAULT_FALLBACK);
953953
ret = do_huge_pmd_wp_page_fallback(mm, vma, address,
954954
pmd, orig_pmd, page, haddr);
955+
if (ret & VM_FAULT_OOM)
956+
split_huge_page(page);
955957
put_page(page);
956958
goto out;
957959
}
958960
count_vm_event(THP_FAULT_ALLOC);
959961

960962
if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))) {
961963
put_page(new_page);
964+
split_huge_page(page);
962965
put_page(page);
963966
ret |= VM_FAULT_OOM;
964967
goto out;

mm/memory.c

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3486,6 +3486,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
34863486
if (unlikely(is_vm_hugetlb_page(vma)))
34873487
return hugetlb_fault(mm, vma, address, flags);
34883488

3489+
retry:
34893490
pgd = pgd_offset(mm, address);
34903491
pud = pud_alloc(mm, pgd, address);
34913492
if (!pud)
@@ -3499,13 +3500,24 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
34993500
pmd, flags);
35003501
} else {
35013502
pmd_t orig_pmd = *pmd;
3503+
int ret;
3504+
35023505
barrier();
35033506
if (pmd_trans_huge(orig_pmd)) {
35043507
if (flags & FAULT_FLAG_WRITE &&
35053508
!pmd_write(orig_pmd) &&
3506-
!pmd_trans_splitting(orig_pmd))
3507-
return do_huge_pmd_wp_page(mm, vma, address,
3508-
pmd, orig_pmd);
3509+
!pmd_trans_splitting(orig_pmd)) {
3510+
ret = do_huge_pmd_wp_page(mm, vma, address, pmd,
3511+
orig_pmd);
3512+
/*
3513+
* If COW results in an oom, the huge pmd will
3514+
* have been split, so retry the fault on the
3515+
* pte for a smaller charge.
3516+
*/
3517+
if (unlikely(ret & VM_FAULT_OOM))
3518+
goto retry;
3519+
return ret;
3520+
}
35093521
return 0;
35103522
}
35113523
}

0 commit comments

Comments
 (0)