Closed
Description
Recently I found OOM during utilizing the Fish node for AWS workloads (maybe leak happens in the fish logic itself as well).
Steps to Reproduce
- Run Fish node and load it with requests for Allocate/Deallocate AWS instances
- Wait, in a month it will die
Platform and Version
- Ubuntu 20.04.6 LTS
- Aquarium Fish v0.7.1 (231111.070935)
Logs taken while reproducing problem
Fish log:
May 03 15:27:34 host aquarium-fish[2535560]: INFO: Fish: Start executing Application 12050150-63b1-463f-a0c0-340be13ab1bc NEW
May 03 15:27:34 host aquarium-fish[2535560]: INFO: Fish: Allocate the resource using the driver aws
May 03 15:27:34 host aquarium-fish[2535560]: INFO: AWS: Selected security group: sg-0c23fd513c198d0c9 fish-14e08c41499b
May 03 15:27:34 host aquarium-fish[2535560]: INFO: AWS: Selected snapshot: snap-03ad771fb3259b032 fish-14e08c41499b
May 03 15:27:35 host systemd[1]: aquarium-fish.service: Main process exited, code=killed, status=9/KILL
May 03 15:27:35 host systemd[1]: aquarium-fish.service: Failed with result 'signal'.
May 03 15:27:35 host systemd[1]: aquarium-fish.service: Scheduled restart job, restart counter is at 1.
May 03 15:27:35 host systemd[1]: Stopped Run aquarium-fish node service as unprevileged user.
May 03 15:27:35 host systemd[1]: Started Run aquarium-fish node service as unprevileged user.
May 03 15:27:36 host aquarium-fish[3339182]: INFO: Aquarium Fish v0.7.1 (231111.070935)
May 03 15:27:36 host aquarium-fish[3339182]: INFO: Fish init TLS...
May 03 15:27:36 host aquarium-fish[3339182]: INFO: Fish starting ORM...
...
Dmesg OOM:
[17237754.236655] aquarium-fish invoked oom-killer: gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
[17237754.236662] CPU: 0 PID: 2535578 Comm: aquarium-fish Tainted: P E 5.4.0-121-generic #137-Ubuntu
[17237754.236663] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.10.2-1.1~u16.04+mcp2 04/01/2014
[17237754.236675] Call Trace:
[17237754.237606] dump_stack+0x6d/0x8b
[17237754.238009] dump_header+0x4f/0x1eb
[17237754.238011] oom_kill_process.cold+0xb/0x10
[17237754.238288] out_of_memory+0x1cf/0x4d0
[17237754.238426] __alloc_pages_slowpath+0xd5e/0xe50
[17237754.238670] ? x2apic_send_IPI_mask+0x13/0x20
[17237754.238672] __alloc_pages_nodemask+0x2d0/0x320
[17237754.238674] alloc_pages_vma+0x7f/0x220
[17237754.238677] wp_page_copy+0x45b/0xa00
[17237754.238678] do_wp_page+0x94/0x6a0
[17237754.238683] ? __switch_to_asm+0x34/0x70
[17237754.238684] ? __switch_to_asm+0x40/0x70
[17237754.238686] __handle_mm_fault+0x771/0x7a0
[17237754.238687] handle_mm_fault+0xca/0x200
[17237754.238690] do_user_addr_fault+0x1f9/0x450
[17237754.238691] __do_page_fault+0x58/0x90
[17237754.238969] ? exit_to_usermode_loop+0x8f/0x160
[17237754.238970] do_page_fault+0x2c/0xe0
[17237754.238972] do_async_page_fault+0x39/0x70
[17237754.238974] async_page_fault+0x34/0x40
[17237754.238988] RIP: 0033:0xa035d7
[17237754.238991] Code: 24 18 48 8b 4c 24 20 48 c7 c5 80 00 00 00 f3 0f 6f 00 f3 0f 6f 0b f3 0f 6f 11 f3 0f 6f 1a 66 0f ef c1 66 0f ef c2 66 0f ef c3 <f3> 0f 7f 02 48 83 c0 10 48 83 c3 10 48 83 c1 10 48 83 c2 10 48 83
[17237754.238992] RSP: 002b:000000c000604e80 EFLAGS: 00010202
[17237754.238993] RAX: 000000c3dd110c00 RBX: 000000c3d99dec00 RCX: 000000c000604f18
[17237754.238994] RDX: 000000c3dd111000 RSI: 000000c000604f18 RDI: 000000c000605318
[17237754.238994] RBP: 0000000000000080 R08: 000000c3c07d4000 R09: 000000001c93cc00
[17237754.238995] R10: 00000000000124f4 R11: 000000000000dcc7 R12: 000000c000604f08
[17237754.238995] R13: 0000000000080000 R14: 000000c00060f860 R15: 0000000000000003
[17237754.239068] Mem-Info:
[17237754.239089] active_anon:3638021 inactive_anon:364553 isolated_anon:543
active_file:118 inactive_file:187 isolated_file:0
unevictable:0 dirty:0 writeback:7 unstable:0
slab_reclaimable:13933 slab_unreclaimable:23964
mapped:209 shmem:546 pagetables:9136 bounce:0
free:33675 free_pcp:19 free_cma:0
[17237754.239092] Node 0 active_anon:14552084kB inactive_anon:1458212kB active_file:472kB inactive_file:748kB unevictable:0kB isolated(anon):2172kB isolated(file):0kB mapped:836k
B dirty:0kB writeback:28kB shmem:2184kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 11415552kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[17237754.239092] Node 0 DMA free:15908kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present
:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[17237754.239239] lowmem_reserve[]: 0 2991 15993 15993 15993
[17237754.239241] Node 0 DMA32 free:64208kB min:12576kB low:15720kB high:18864kB active_anon:2995624kB inactive_anon:40kB active_file:84kB inactive_file:88kB unevictable:0kB writ
epending:0kB present:3129192kB managed:3063656kB mlocked:0kB kernel_stack:0kB pagetables:3316kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[17237754.239243] lowmem_reserve[]: 0 0 13001 13001 13001
[17237754.239244] Node 0 Normal free:54584kB min:54940kB low:68672kB high:82404kB active_anon:11556980kB inactive_anon:1458172kB active_file:388kB inactive_file:464kB unevictable
:0kB writepending:28kB present:13631488kB managed:13313812kB mlocked:0kB kernel_stack:4496kB pagetables:33228kB bounce:0kB free_pcp:88kB local_pcp:0kB free_cma:0kB
[17237754.239246] lowmem_reserve[]: 0 0 0 0 0
[17237754.239247] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
[17237754.239252] Node 0 DMA32: 73*4kB (UM) 58*8kB (UM) 22*16kB (UM) 14*32kB (UM) 5*64kB (UM) 2*128kB (UM) 0*256kB 1*512kB (M) 2*1024kB (UM) 1*2048kB (U) 14*4096kB (ME) = 64084kB
[17237754.239256] Node 0 Normal: 4563*4kB (UMEH) 2862*8kB (UMEH) 676*16kB (UMEH) 96*32kB (UME) 1*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55100kB
[17237754.239280] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[17237754.239280] 2099 total pagecache pages
[17237754.239284] 1219 pages in swap cache
[17237754.239285] Swap cache stats: add 135784, delete 134572, find 10926656/10927674
[17237754.239285] Free swap = 0kB
[17237754.239286] Total swap = 483800kB
[17237754.239286] 4194168 pages RAM
[17237754.239286] 0 pages HighMem/MovableOnly
[17237754.239287] 95824 pages reserved
[17237754.239290] 0 pages cma reserved
[17237754.239290] 0 pages hwpoisoned
[17237754.239291] Tasks state (memory values in pages):
[17237754.239291] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
----8<---->8-----
[17237754.239313] [ 452] 0 452 2424 61 53248 3 0 cron
[17237754.239315] [ 453] 103 453 1930 242 53248 12 -900 dbus-daemon
[17237754.239319] [ 462] 0 462 20473 36 61440 39 0 irqbalance
[17237754.239324] [ 463] 0 463 8364 2220 94208 69 0 networkd-dispat
[17237754.239325] [ 470] 0 470 12679 1234 126976 4145 0 salt-minion
[17237754.239327] [ 474] 0 474 4342 124 73728 113 0 systemd-logind
[17237754.239329] [ 500] 0 500 2127 0 49152 30 0 agetty
[17237754.239331] [ 522] 0 522 2184 24 49152 4 0 agetty
[17237754.239332] [ 535] 109 535 1205 66 45056 13 0 chronyd
[17237754.239334] [ 546] 109 546 1172 29 45056 16 0 chronyd
[17237754.239340] [ 567] 0 567 58412 309 86016 42 0 polkitd
[17237754.239341] [ 645] 0 645 27711 1159 110592 755 0 unattended-upgr
----8<---->8-----
[17237754.239347] [ 799] 0 799 622 1 45056 17 0 none
----8<---->8-----
[17237754.239355] [ 500099] 0 500099 4846 91 57344 192 -1000 systemd-udevd
[17237754.239357] [ 500230] 106 500230 2725 13 53248 30 0 uuidd
[17237754.239359] [ 500334] 0 500334 376580 2488 307200 948 -999 containerd
[17237754.239361] [ 500418] 0 500418 59639 74 94208 134 0 accounts-daemon
[17237754.239363] [ 500744] 100 500744 8900 40 81920 202 0 systemd-network
[17237754.239365] [ 500800] 101 500800 6106 43 86016 909 0 systemd-resolve
[17237754.239366] [ 500811] 0 500811 100004 281 790528 1 -250 systemd-journal
[17237754.239368] [ 501322] 0 501322 384968 1578 397312 4592 -500 dockerd
[17237754.239373] [ 507353] 0 507353 3046 52 65536 180 -1000 sshd
[17237754.239375] [ 509726] 104 509726 56079 248 77824 113 0 rsyslogd
[17237754.239377] [2535560] 1002 2535560 4722347 3929746 32657408 102647 0 aquarium-fish
[17237754.239382] [3228804] 1001 3228804 72099 14784 528384 805 0 splunkd
[17237754.239384] [3228834] 1001 3228834 19410 1117 139264 942 0 splunkd
[17237754.239397] [3339170] 0 3339170 4846 30 53248 257 0 systemd-udevd
[17237754.239399] [3339171] 0 3339171 4846 54 53248 233 0 systemd-udevd
[17237754.239400] [3339172] 0 3339172 4846 65 53248 222 0 systemd-udevd
[17237754.239402] [3339173] 0 3339173 4846 120 53248 167 0 systemd-udevd
[17237754.239403] [3339174] 0 3339174 4846 97 53248 190 0 systemd-udevd
[17237754.239405] [3339175] 0 3339175 4846 99 53248 188 0 systemd-udevd
[17237754.239406] [3339176] 0 3339176 4846 134 53248 153 0 systemd-udevd
[17237754.239407] [3339177] 0 3339177 4846 2 53248 286 0 systemd-udevd
[17237754.239411] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/aquarium-fish.service,task=aquarium-fish,pid=253
5560,uid=1002
[17237754.239476] Out of memory: Killed process 2535560 (aquarium-fish) total-vm:18889388kB, anon-rss:15718984kB, file-rss:0kB, shmem-rss:0kB, UID:1002 pgtables:31892kB oom_score
_adj:0
[17237754.493164] oom_reaper: reaped process 2535560 (aquarium-fish), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB