Skip to content

Commit

Permalink
mm: Reclaim more pages to find free pages in compaction
Browse files Browse the repository at this point in the history
There were many order-3 fail allocation report while VM had lots of
*reclaimable* memory.

17353.434071] kworker/u16:4 invoked oom-killer: gfp_mask=0x6160c0(GFP_KERNEL|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_MEMALLOC), nodemask=(null), order=3, oom_score_adj=0
[17353.434079] kworker/u16:4 cpuset=/ mems_allowed=0
[17353.434086] CPU: 6 PID: 30045 Comm: kworker/u16:4 Tainted: G S      WC O      4.19.95-g8137b6ce669e-ab6554412 xiaomi-sm8250-devs#1
[17353.434089] Hardware name: Google Inc. MSM sm7250 v2 Bramble DVT (DT)
[17353.434194] Workqueue: iparepwq95 __typeid__ZTSFiP44ipa_disable_force_clear_datapath_req_msg_v01E_global_addr [ipa3]
[17353.434197] Call trace:
[17353.434206] __typeid__ZTSFjP11task_structPK11user_regsetE_global_addr+0x14/0x18
[17353.434210] dump_stack+0xbc/0xf8
[17353.434217] dump_header+0xc8/0x250
[17353.434220] oom_kill_process+0x130/0x538
[17353.434222] out_of_memory+0x320/0x444
[17353.434226] __alloc_pages_nodemask+0x1124/0x13b4
[17353.434314] ipa3_alloc_rx_pkt_page+0x64/0x1a8 [ipa3]
[17353.434403] ipa3_wq_page_repl+0x78/0x1a4 [ipa3]
[17353.434407] process_one_work+0x3a8/0x6e4
[17353.434410] worker_thread+0x394/0x820
[17353.434413] kthread+0x19c/0x1ac
[17353.434417] ret_from_fork+0x10/0x18
[17353.434419] Mem-Info:
[17353.434424] active_anon:357378 inactive_anon:119141 isolated_anon:13\x0a active_file:97495 inactive_file:122151 isolated_file:22\x0a unevictable:49750 dirty:3553 writeback:0 unstable:0\x0a slab_reclaimable:30018 slab_unreclaimable:73884\x0a mapped:259586 shmem:27580 pagetables:39581 bounce:0\x0a free:17710 free_pcp:301 free_cma:0
[17353.434433] Node 0 active_anon:1429512kB inactive_anon:476564kB active_file:389980kB inactive_file:488604kB unevictable:199000kB isolated(anon):52kB isolated(file):88kB mapped:1038344kB dirty:14212kB writeback:0kB shmem:110320kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[17353.434439] Normal free:70840kB min:9172kB low:43900kB high:49484kB active_anon:1429284kB inactive_anon:476336kB active_file:389980kB inactive_file:488604kB unevictable:199000kB writepending:14212kB present:5764280kB managed:5584928kB mlocked:199000kB kernel_stack:92656kB shadow_call_stack:5792kB pagetables:158324kB bounce:0kB free_pcp:1204kB local_pcp:108kB free_cma:0kB
[17353.434441] lowmem_reserve[]: 0 0
[17353.434444] Normal: 8956*4kB (UMEH) 2726*8kB (UH) 751*16kB (UH) 33*32kB (H) 7*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 71152kB
[17353.434451] 300317 total pagecache pages
[17353.434454] 4228 pages in swap cache
[17353.434456] Swap cache stats: add 20710158, delete 20707317, find 1014864/9891370
[17353.434459] Free swap  = 103732kB
[17353.434460] Total swap = 2097148kB
[17353.434462] 1441070 pages RAM
[17353.434465] 0 pages HighMem/MovableOnly
[17353.434466] 44838 pages reserved
[17353.434469] 73728 pages cma reserved

When we saw the trace, compaction finished with COMPACT_COMPLETE(iow, it
already did full scanning a zone but failed to create order-3 allocation)
so should_compact_retry returns "false".

           <...>-30045 [006] .... 17353.433704: reclaim_retry_zone: node=0 zone=Normal   order=3 reclaimable=696132 available=713920 min_wmark=2293 no_progress_loops=0 wmark_check=0
           <...>-30045 [006] .... 17353.433706: compact_retry: order=3 priority=COMPACT_PRIO_SYNC_FULL compaction_result=failed retries=0 max_retries=16 should_retry=0

If we see previous trace, we could see compaction is hard to find free pages
in the zone so free scanner of compaction moves fast toward migration scanner
and finally, they(migration scanner and free page scanner) crossed over.

           <...>-30045 [006] .... 17353.427026: mm_compaction_isolate_freepages: range=(0x144c00 ~ 0x145000) nr_scanned=784 nr_taken=0
           <...>-30045 [006] .... 17353.427037: mm_compaction_isolate_freepages: range=(0x144800 ~ 0x144c00) nr_scanned=1019 nr_taken=0
           <...>-30045 [006] .... 17353.427049: mm_compaction_isolate_freepages: range=(0x144400 ~ 0x144800) nr_scanned=880 nr_taken=1
           <...>-30045 [006] .... 17353.427061: mm_compaction_isolate_freepages: range=(0x144000 ~ 0x144400) nr_scanned=869 nr_taken=0
           <...>-30045 [006] .... 17353.427212: mm_compaction_isolate_freepages: range=(0x140c00 ~ 0x141000) nr_scanned=1016 nr_taken=0
..
..
           <...>-30045 [006] .... 17353.433696: mm_compaction_finished: node=0 zone=Normal   order=3 ret=complete
           <...>-30045 [006] .... 17353.433698: mm_compaction_end: zone_start=0x80600 migrate_pfn=0xc9400 free_pfn=0xc9500 zone_end=0x200000, mode=sync status=complete

If we see previous trace to see reclaim activities, we could see
it was not hard to reclaim memory.

           <...>-30045 [006] .... 17353.413941: mm_vmscan_direct_reclaim_begin: order=3 may_writepage=1 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_MEMALLOC classzone_idx=0
           <...>-30045 [006] d..1 17353.413946: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=8 nr_scanned=8 nr_skipped=0 nr_taken=8 lru=inactive_anon
           <...>-30045 [006] .... 17353.413958: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=8 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=8 nr_ref_keep=0 nr_unmap_fail=0 priority=12 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
           <...>-30045 [006] .... 17353.413960: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119119 inactive=119119 total_active=357352 active=357352 ratio=3 flags=RECLAIM_WB_ANON
           <...>-30045 [006] d..1 17353.413965: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=22 nr_scanned=22 nr_skipped=0 nr_taken=22 lru=inactive_file
           <...>-30045 [006] .... 17353.413979: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=22 nr_reclaimed=22 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=12 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
           <...>-30045 [006] .... 17353.413979: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122195 inactive=122195 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE
           <...>-30045 [006] .... 17353.413980: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119119 inactive=119119 total_active=357352 active=357352 ratio=3 flags=RECLAIM_WB_ANON
           <...>-30045 [006] .... 17353.414134: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON
           <...>-30045 [006] .... 17353.414135: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON
           <...>-30045 [006] d..1 17353.414141: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=29 nr_scanned=29 nr_skipped=0 nr_taken=29 lru=inactive_anon
           <...>-30045 [006] .... 17353.414170: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=29 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=29 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
           <...>-30045 [006] .... 17353.414170: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119107 inactive=119107 total_active=357385 active=357385 ratio=3 flags=RECLAIM_WB_ANON
           <...>-30045 [006] d..1 17353.414176: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=32 nr_scanned=32 nr_skipped=0 nr_taken=32 lru=active_anon
           <...>-30045 [006] .... 17353.414206: mm_vmscan_lru_shrink_active: nid=0 nr_taken=32 nr_active=0 nr_deactivated=32 nr_referenced=32 priority=10 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
           <...>-30045 [006] d..1 17353.414212: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=32 nr_scanned=32 nr_skipped=0 nr_taken=32 lru=inactive_file
           <...>-30045 [006] .... 17353.414225: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=32 nr_reclaimed=32 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
           <...>-30045 [006] .... 17353.414225: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122131 inactive=122131 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE
           <...>-30045 [006] d..1 17353.414228: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=16 nr_scanned=16 nr_skipped=0 nr_taken=16 lru=inactive_file
           <...>-30045 [006] .... 17353.414235: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=16 nr_reclaimed=16 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
           <...>-30045 [006] .... 17353.414235: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122115 inactive=122115 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE
           <...>-30045 [006] .... 17353.414236: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119139 inactive=119139 total_active=357353 active=357353 ratio=3 flags=RECLAIM_WB_ANON
           <...>-30045 [006] .... 17353.414320: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON
           <...>-30045 [006] .... 17353.414321: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON
           <...>-30045 [006] .... 17353.414339: mm_vmscan_direct_reclaim_end: nr_reclaimed=70

Based on that, we could assume that if reclaimer has reclaimed more pages,
compaction could find free pages easily so free scanner of compaction were
not moved fast like that. That means it wouldn't fail for non-costly high-order
allocation.

What this patch does is if the order is non-costly high order allocation,
it will keep trying migration with reclaiming if system has enough
reclaimable memory.

Bug: 159909686
Bug: 156785617
Bug: 158449887
  • Loading branch information
Minchan Kim authored and NotZeetaa committed Dec 14, 2021
1 parent 5e1b98c commit 941c226
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions mm/page_alloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -4133,6 +4133,12 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
(*compact_priority)--;
*compaction_retries = 0;
ret = true;
} else if (order <= PAGE_ALLOC_COSTLY_ORDER) {
/*
* If it's non-alloc-costly order and has enough reclaimable
* memory, retries further to prevent premature OOM kill.
*/
ret = compaction_zonelist_suitable(ac, order, alloc_flags);
}
out:
trace_compact_retry(order, priority, compact_result, retries, max_retries, ret);
Expand Down

0 comments on commit 941c226

Please sign in to comment.