Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix mem-burn process been killed too early by kernel oom_killer #89

Merged
merged 3 commits into from
Jul 29, 2021

Conversation

buhuipao
Copy link
Contributor

Describe what this PR does / why we need it

when mem-burn in ram mode and reserve param is too small, the mem-burn process will be killed early before reaching the target retention value. so I write score_adj '-1000' to mem-burn process oom_score_adj file.

Does this pull request fix one issue?

Describe how you did it

Describe how to verify it

Special notes for reviews

Signed-off-by: aliverchen <aliverchen@tencent.com>
@ioworker0
Copy link
Contributor

As far as I am concerned, it could be very dangerous.

In chinese:
很抱歉,我想提出一些我个人的看法。
这只是一个内存占用的故障,如果调低它的oom_score_adj,会加大一些无辜的进程被杀死的概率。
对于混沌工程来说,"最小爆炸半径"应该是可控的才对。

@xcaspar
Copy link
Member

xcaspar commented Jun 18, 2021

As far as I am concerned, it could be very dangerous.

In chinese:
很抱歉,我想提出一些我个人的看法。
这只是一个内存占用的故障,如果调低它的oom_score_adj,会加大一些无辜的进程被杀死的概率。
对于混沌工程来说,"最小爆炸半径"应该是可控的才对。

之前遇到过一个线上故障,因一个运维进程占用内存过大,导致业务应用进程被干掉,这个 PR 更好模拟这个故障,但是最好将其抽取一个配置参数。


I have encountered an online failure before. Because an operation process occupies too much memory, the business application process is killed. This PR can better simulate the failure, but it is best to extract a configuration parameter.

@xcaspar
Copy link
Member

xcaspar commented Jun 18, 2021

@buhuipao Please modify the username used when committing. And can you extract a flag to enable this feature? Thanks~

@ioworker0
Copy link
Contributor

As far as I am concerned, it could be very dangerous.
In chinese:
很抱歉,我想提出一些我个人的看法。
这只是一个内存占用的故障,如果调低它的oom_score_adj,会加大一些无辜的进程被杀死的概率。
对于混沌工程来说,"最小爆炸半径"应该是可控的才对。

之前遇到过一个线上故障,因一个运维进程占用内存过大,导致业务应用进程被干掉,这个 PR 更好模拟这个故障,但是最好将其抽取一个配置参数。

I have encountered an online failure before. Because an operation process occupies too much memory, the business application process is killed. This PR can better simulate the failure, but it is best to extract a configuration parameter.

Yep, I agree with you.

@xcaspar xcaspar self-requested a review June 18, 2021 06:26
aliverchen and others added 2 commits June 22, 2021 16:58
@xcaspar xcaspar added this to the v1.3.0 milestone Jun 24, 2021
@buhuipao
Copy link
Contributor Author

@buhuipao Please modify the username used when committing. And can you extract a flag to enable this feature? Thanks~

Done.

@xcaspar
Copy link
Member

xcaspar commented Jun 25, 2021

@buhuipao OK, thanks for your contribution, I will review it this week.

@xcaspar xcaspar changed the base branch from master to 1.3.0-dev July 29, 2021 03:47
@xcaspar xcaspar merged commit 7f6cf89 into chaosblade-io:1.3.0-dev Jul 29, 2021
xcaspar pushed a commit that referenced this pull request Aug 4, 2021
* fix mem-burn process been killed too early by kernel oom_killer

Signed-off-by: aliverchen <aliverchen@tencent.com>

* fix mem-burn process been killed too early by kernel oom_killer

Signed-off-by: aliverchen <aliverchen@tencent.com>

Co-authored-by: aliverchen <aliverchen@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants