Skip to content

Commit 164cc4f

Browse files
rikvanrieltorvalds
authored andcommitted
mm,thp,shmem: limit shmem THP alloc gfp_mask
Patch series "mm,thp,shm: limit shmem THP alloc gfp_mask", v6. The allocation flags of anonymous transparent huge pages can be controlled through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can help the system from getting bogged down in the page reclaim and compaction code when many THPs are getting allocated simultaneously. However, the gfp_mask for shmem THP allocations were not limited by those configuration settings, and some workloads ended up with all CPUs stuck on the LRU lock in the page reclaim code, trying to allocate dozens of THPs simultaneously. This patch applies the same configurated limitation of THPs to shmem hugepage allocations, to prevent that from happening. This way a THP defrag setting of "never" or "defer+madvise" will result in quick allocation failures without direct reclaim when no 2MB free pages are available. With this patch applied, THP allocations for tmpfs will be a little more aggressive than today for files mmapped with MADV_HUGEPAGE, and a little less aggressive for files that are not mmapped or mapped without that flag. This patch (of 4): The allocation flags of anonymous transparent huge pages can be controlled through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can help the system from getting bogged down in the page reclaim and compaction code when many THPs are getting allocated simultaneously. However, the gfp_mask for shmem THP allocations were not limited by those configuration settings, and some workloads ended up with all CPUs stuck on the LRU lock in the page reclaim code, trying to allocate dozens of THPs simultaneously. This patch applies the same configurated limitation of THPs to shmem hugepage allocations, to prevent that from happening. Controlling the gfp_mask of THP allocations through the knobs in sysfs allows users to determine the balance between how aggressively the system tries to allocate THPs at fault time, and how much the application may end up stalling attempting those allocations. This way a THP defrag setting of "never" or "defer+madvise" will result in quick allocation failures without direct reclaim when no 2MB free pages are available. With this patch applied, THP allocations for tmpfs will be a little more aggressive than today for files mmapped with MADV_HUGEPAGE, and a little less aggressive for files that are not mmapped or mapped without that flag. Link: https://lkml.kernel.org/r/20201124194925.623931-1-riel@surriel.com Link: https://lkml.kernel.org/r/20201124194925.623931-2-riel@surriel.com Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Xu Yu <xuyu@linux.alibaba.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent a656a20 commit 164cc4f

File tree

3 files changed

+10
-6
lines changed

3 files changed

+10
-6
lines changed

include/linux/gfp.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -634,6 +634,8 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask);
634634
extern void pm_restrict_gfp_mask(void);
635635
extern void pm_restore_gfp_mask(void);
636636

637+
extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma);
638+
637639
#ifdef CONFIG_PM_SLEEP
638640
extern bool pm_suspended_storage(void);
639641
#else

mm/huge_memory.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -668,9 +668,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
668668
* available
669669
* never: never stall for any thp allocation
670670
*/
671-
static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
671+
gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma)
672672
{
673-
const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
673+
const bool vma_madvised = vma && (vma->vm_flags & VM_HUGEPAGE);
674674

675675
/* Always do synchronous compaction */
676676
if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags))
@@ -762,7 +762,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
762762
}
763763
return ret;
764764
}
765-
gfp = alloc_hugepage_direct_gfpmask(vma);
765+
gfp = vma_thp_gfp_mask(vma);
766766
page = alloc_hugepage_vma(gfp, vma, haddr, HPAGE_PMD_ORDER);
767767
if (unlikely(!page)) {
768768
count_vm_event(THP_FAULT_FALLBACK);

mm/shmem.c

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1519,8 +1519,8 @@ static struct page *shmem_alloc_hugepage(gfp_t gfp,
15191519
return NULL;
15201520

15211521
shmem_pseudo_vma_init(&pvma, info, hindex);
1522-
page = alloc_pages_vma(gfp | __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN,
1523-
HPAGE_PMD_ORDER, &pvma, 0, numa_node_id(), true);
1522+
page = alloc_pages_vma(gfp, HPAGE_PMD_ORDER, &pvma, 0, numa_node_id(),
1523+
true);
15241524
shmem_pseudo_vma_destroy(&pvma);
15251525
if (page)
15261526
prep_transhuge_page(page);
@@ -1776,6 +1776,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
17761776
struct page *page;
17771777
enum sgp_type sgp_huge = sgp;
17781778
pgoff_t hindex = index;
1779+
gfp_t huge_gfp;
17791780
int error;
17801781
int once = 0;
17811782
int alloced = 0;
@@ -1862,7 +1863,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
18621863
}
18631864

18641865
alloc_huge:
1865-
page = shmem_alloc_and_acct_page(gfp, inode, index, true);
1866+
huge_gfp = vma_thp_gfp_mask(vma);
1867+
page = shmem_alloc_and_acct_page(huge_gfp, inode, index, true);
18661868
if (IS_ERR(page)) {
18671869
alloc_nohuge:
18681870
page = shmem_alloc_and_acct_page(gfp, inode,

0 commit comments

Comments
 (0)