You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After the task runs for a period of time (about 3 to 20 days), the Container running the JobManager will always be killed by ResourceManager. Then I start the GC log of the JobManager. The process that discovered JobManager performs a next-generation GC about every 2 minutes or so, as follows:
In order to understand the cause of JobManager's frequent GC, I dump the objects in JobManager's java heap into local files, and then use VisualVM to open them for analysis, and find that Char[] occupies the largest memory space, as shown in the following figure:
What are the reasons for this? If we use FLINK_HOME/bin/flink run t yarn-per-job to submit the task from the command line, we will not generate so many Char[]. The GC time of JobManager (this program's parameters are exactly the same as the above parameters) is about once every 40 minutes. This situation seems to be relatively normal
As for the reason why containers are frequently killed, we will set jobmanager.memory.enable-jvm-direct-memory-limit = true to avoid memory overlimit. Do we know whether this parameter is useful for memory overlimit killing?
Error Exception
Failing this attempt.Diagnostics: [2023-08-22 08:49:10.443]Container [pid=77475,container/D=container_e08_1683881703260_1165_01 0000011 running 9510912B beyond the
PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 3.2 GB of 2.1 GB virtual memory used. Killing container
Thank you for the detailed feedback. StreamPark is merely a platform for managing and submitting flinkjob. You can look into your flinkjob itself to investigate further reasons.
Search before asking
Java Version
1.8.0_212
Scala Version
2.12.x
StreamPark Version
2.0.0
Flink Version
1.15.4
deploy mode
yarn-application
What happened
When I submit a Flink On Yarn (Yarn application mode) task using StreamPark, the JobManager's parameters look like this:
jobmanager.memory.heap.size 469762048b
jobmanager.memory.jvm-metaspace.size 268435456b
jobmanager.memory.jvm-overhead.max 201326592b
jobmanager.memory.jvm-overhead.min 201326592b
jobmanager.memory.off-heap.size 134217728b
jobmanager.memory.process.size 1024mb
After the task runs for a period of time (about 3 to 20 days), the Container running the JobManager will always be killed by ResourceManager. Then I start the GC log of the JobManager. The process that discovered JobManager performs a next-generation GC about every 2 minutes or so, as follows:
2023-08-30T13:56:57.694+0800: [GC (Allocation Failure)] [PSYoungGen: 149956K->1673K(150528K)] 315127K->166876K(456704K), 0.0138514 secs] [Times: user=0.54 sys=0.05, real=0.02 secs]
2023-08-30T13:59:17.558+0800: [GC (Allocation Failure)] [PSYoungGen: 150141K->1636K(150528K)] 315344K->166871K(456704K), 0.0285263 secs] [Times: user= 1.20sys =0.11, real=0.03 secs]
...
2023-08-30T14:47:54.412+0800: [GC (Allocation Failure)] [PSYoungGen: 148425K->1700K(150016K)] 314796K->168135K(456192K), 0.0258613 secs] [Times: user= 0.96sys =0.06, real=0.03 secs]
2023-08-30T14:50:12.434+0800: [GC (Allocation Failure)] [PSYoungGen: 149138K->1156K(150016K)] 315573K->167607K(456192K), 0.0233593 secs] [Times: user=0.77 sys=0.07, real=0.03 secs]
In order to understand the cause of JobManager's frequent GC, I dump the objects in JobManager's java heap into local files, and then use VisualVM to open them for analysis, and find that Char[] occupies the largest memory space, as shown in the following figure:
What are the reasons for this? If we use FLINK_HOME/bin/flink run t yarn-per-job to submit the task from the command line, we will not generate so many Char[]. The GC time of JobManager (this program's parameters are exactly the same as the above parameters) is about once every 40 minutes. This situation seems to be relatively normal
As for the reason why containers are frequently killed, we will set jobmanager.memory.enable-jvm-direct-memory-limit = true to avoid memory overlimit. Do we know whether this parameter is useful for memory overlimit killing?
Error Exception
Screenshots
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: