Skip to content

Conversation

@pfliu
Copy link
Collaborator

@pfliu pfliu commented Jun 26, 2025

At present, the deduction of the proper crashkernel value depends on
detecting platform details. However, more and more platforms are now
deployed using OSTree images, which means kdump-utils cannot retrieve
complete platform information. This often leads to an underestimation of
the memory required by the kdump kernel, increasing the risk of
out-of-memory (OOM) issues.

To mitigate this situation, we choose to maximize the reserved
crashkernel size in the osbuild case, and recommend users to update the
kernel command line and release the excess reserved memory after reboot.

That is done by: kdumpctl stop; echo $_recommend_value >/sys/kernel/kexec_crash_size; kdumpctl start

Summary by Sourcery

Force the crashkernel recommendation to its maximum when running on OSTree-based deployments by bypassing platform-specific heuristics.

Enhancements:

  • Introduce a maximize_crashkernel flag to disable platform detection under OSTree images
  • Guard SME/SEV, MLX5 core, and ARM SMMU checks behind maximize_crashkernel to prevent underestimation
  • Add is_aarch64_64k_kernel helper and apply the maximize_crashkernel flag to 64k kernel suffix detection
  • Enable maximize_crashkernel in kdump_get_arch_recommend_crashkernel when running on OSTree

@sourcery-ai
Copy link

sourcery-ai bot commented Jun 26, 2025

Reviewer's Guide

Introduced a maximize_crashkernel flag triggered for OSTree deployments to bypass hardware-specific checks and guarantee maximum crashkernel reservation; added a helper for detecting 64k aarch64 kernels and updated recommendation logic accordingly.

Class diagram for maximize_crashkernel logic and related helpers

classDiagram
    class kdump-lib.sh {
        +maximize_crashkernel: int
        +is_uki()
        +is_aws_aarch64()
        +is_sme_or_sev_active()
        +has_command()
        +get_recommend_size()
        +has_mlx5()
        +has_aarch64_smmu()
        +is_aarch64_64k_kernel(_kernel)
        +is_memsize()
        +kdump_get_arch_recommend_crashkernel()
    }
    kdump-lib.sh : maximize_crashkernel set by is_ostree()
    kdump-lib.sh : is_aarch64_64k_kernel used in kdump_get_arch_recommend_crashkernel()
    kdump-lib.sh : maximize_crashkernel disables hardware checks in helpers
Loading

Class diagram for is_aarch64_64k_kernel helper

classDiagram
    class is_aarch64_64k_kernel {
        +is_aarch64_64k_kernel(_kernel)
    }
    kdump-lib.sh --> is_aarch64_64k_kernel : uses
Loading

File-Level Changes

Change Details Files
Introduce maximize_crashkernel override for OSTree images
  • Initialize maximize_crashkernel to 0
  • Set maximize_crashkernel based on is_ostree in recommendation function
kdump-lib.sh
Short-circuit platform detection when maximizing reservation
  • Guard is_sme_or_sev_active with maximize_crashkernel flag
  • Guard has_mlx5 with maximize_crashkernel flag
  • Guard has_aarch64_smmu with maximize_crashkernel flag
kdump-lib.sh
Add helper to detect 64k-page aarch64 kernels
  • Define is_aarch64_64k_kernel that respects maximize_crashkernel
  • Use helper instead of inline grep for kernel suffix check
kdump-lib.sh

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pfliu - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `kdump-lib.sh:1005` </location>
<code_context>
 	local _arch _ck_cmdline _dump_mode
 	local _delta=0

+	# osbuild deploies rpm on chroot environment. kdump-utils has no opportunity
+	# to deduce the exact memory cost on the real target.
+	maximize_crashkernel=$(is_ostree)
 	if [[ -z "$1" ]]; then
</code_context>

<issue_to_address>
Typo in comment: 'deploies' should be 'deploys'.

Please update the comment to fix the typo.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
	# osbuild deploies rpm on chroot environment. kdump-utils has no opportunity
	# to deduce the exact memory cost on the real target.
=======
	# osbuild deploys rpm on chroot environment. kdump-utils has no opportunity
	# to deduce the exact memory cost on the real target.
>>>>>>> REPLACE

</suggested_fix>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@pfliu
Copy link
Collaborator Author

pfliu commented Jun 26, 2025

I think even for local dump, if the OSTree image, containing both 4K and 64K variant, is built on aarch64-4K platform and finally the user deploys the 64K variant, we will face the OOM issue

@coiby
Copy link
Member

coiby commented Aug 7, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a smart workaround for memory estimation on OSTree systems by maximizing the crashkernel size and providing a mechanism for users to adjust it. The overall logic is sound. My review identifies a couple of critical shell scripting errors that would cause the script to fail at runtime. I've also included suggestions to improve code maintainability by properly scoping variables within functions. Addressing these points will make the implementation robust and reliable.

kdumpctl Outdated
Comment on lines 1106 to 1174
local _crashkernel
local _recommend
local _recommend_value
local _actual_value

_crashkernel=$(sed -n 's/.*crashkernel=\([^ ]*\).*/\1/p' /proc/cmdline)
[[ $(kdump_get_conf_val auto_reset_crashkernel) == no ]] && return

if ! echo "$_crashkernel" | grep -q -E "(fadump|fadump=0)"; then
_recommend=$(kdump_get_arch_recommend_crashkernel "kdump")
_recommend_value=$(get_recommend_crashkernel_value "$_recommend")
_low_mem=$(get_crashkernel_low_value)
# The crashkernel formula does not include the crashkernel low value
# but the kexec_crash_size does.
_recommend_value=$(memsize_add "$_recommend_value" "$_low_mem")
_recommend_value=$(to_bytes "$_recommend_value")
_actual_value=$(cat /sys/kernel/kexec_crash_size)

if [ "$_recommend_value" -lt "$_actual_value" ]; then
dwarn "The reserved crashkernel is abundant. Using 'kdumpctl reset-crashkernel' to reset kernel cmdline. It will take effect in the next boot"
dwarn "To release the abundant memory immediately. You can do: 'kdumpctl stop', 'echo $_recommend_value >/sys/kernel/kexec_crash_size', and finally 'kdump start'"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable _low_mem is not declared as local. It's good practice to declare all function-scoped variables as local to avoid potential side effects and improve code clarity.

local _crashkernel _recommend _recommend_value _actual_value _low_mem

kdumpctl Outdated
_low_mem=$(get_crashkernel_low_value)
# The crashkernel formula does not include the crashkernel low value
# but the kexec_crash_size does.
_recommend_value=$(memsize_add "$_recommend_value" "$_low_mem")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://docs.kernel.org/arch/arm64/kdump.html#crashkernel-size, for crashkernel=size, the kernel searches the low memory area by default. So crashkernel does include the crashkernel low value. So I think the logic here doesn't make sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The system's behavior diverges. If low memory is enough, there will no extra crashkernel,low is allocated. This is the case you mentioned here. But if low memory is not enough, the crashkernel=size specifies the volume of the memory reserved in high memory, not including the extra 128M crashkernel,low

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original solution, if there is no extra crashkernel.low is reserved, get_crashkernel_low_value() will 'echo 0', which means zero volume is reserved. And it can cover the 1st case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! Combined with a few experiments done on crashkernel, now I see what you mean. But get_crashkernel_low_value and the comment "The crashkernel formula does not include the crashkernel low value" are still confusing. Maybe you can change the function name to improve the code readability.

Btw, I'm curious to ask when will the case where low memory is not enough happen? For aarch64, in my experiment, even crashkernel=2G won't use the high memory,

[root@rhel10 ~]# kdumpctl showmem
kdump: Reserved 2048MB memory for crash kernel
[root@rhel10 ~]# cat /proc/cmdline 
root=UUID=3e69ec88-a92a-4de9-a000-e394a3f7d3ea console=tty0 console=ttyS0,115200n8 no_timer_check systemd.firstboot=off crashkernel=2G
[root@rhel10 ~]# grep Crash /proc/iomem 
  80000000-ffffffff : Crash kernel

Another question is I don't understand why crashkernel=512M,low fails but crashkernel=512M works or is it a known bug?

[root@rhel10 ~]# uname -r
6.12.0-30.el10.aarch64


[root@rhel10 ~]# grep Crash /proc/iomem 
[root@rhel10 ~]# cat /proc/cmdline 
root=UUID=3e69ec88-a92a-4de9-a000-e394a3f7d3ea console=tty0 console=ttyS0,115200n8 no_timer_check systemd.firstboot=off crashkernel=512M,low
[root@rhel10 ~]# kdumpctl showmem
kdump: Reserved 0MB memory for crash kernel

[root@rhel10 ~]# cat /proc/cmdline 
root=UUID=3e69ec88-a92a-4de9-a000-e394a3f7d3ea console=tty0 console=ttyS0,115200n8 no_timer_check systemd.firstboot=off crashkernel=512M
[root@rhel10 ~]# grep Crash /proc/iomem 
  e0000000-ffffffff : Crash kernel
[root@rhel10 ~]# kdumpctl showmem
kdump: Reserved 512MB memory for crash kernel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have another question. I have impression that the default crashkernel value specify the total memory required for crash kernel. Let's say crashkernel=2G-4G:256M,4G-64G:320M,64G-:576M and system RAM is larger than 64G. When low memory is not enough, the memory reserved will become 576M + 128M. Does the system really need an extra 128M or is 576M sufficient? If 576M is sufficient, then there is no need to for get_crashkernel_low_value and related code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pfliu I have a third question which somehow isn't show together with other comments.

Copy link
Collaborator Author

@pfliu pfliu Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have another question. I have impression that the default crashkernel value specify the total memory required for crash kernel. Let's say crashkernel=2G-4G:256M,4G-64G:320M,64G-:576M and system RAM is larger than 64G. When low memory is not enough, the memory reserved will become 576M + 128M. Does the system really need an extra 128M or is 576M sufficient? If 576M is sufficient, then there is no need to for get_crashkernel_low_value and related code.

The extra 128MB is used for bounced DMA, i.e DMA stores data to there and later kernel copies the data to the final destination . I.e if the system has device only able to address 32-bits, but the crashkernel reserves memory all beyond 4GB.

In a short word, 576M is big enough, but without the low 128MB, some hardware device may not work

@coiby
Copy link
Member

coiby commented Aug 7, 2025

Hi @pfliu,

I've reviewed this PR and I think some changes are also needed. You will also need to address feedback from Gemini because some critical issues are found by Gemini. Thanks!

@pfliu
Copy link
Collaborator Author

pfliu commented Aug 27, 2025

@coiby, sorry that I just see your comment, thanks a lot. I will update it later

@pfliu pfliu force-pushed the fix_osbuild branch 2 times, most recently from d0be1ff to ca54f3b Compare August 29, 2025 01:46
@pfliu
Copy link
Collaborator Author

pfliu commented Aug 29, 2025

@coiby, I have updated patches according to your comments. It should resolve all issues except the crashkernel low, which I think it is correct.
When you have time, could you help to review it again?
Thanks

@pfliu
Copy link
Collaborator Author

pfliu commented Sep 1, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the underestimation of crashkernel memory in osbuild environments by maximizing the reserved size during the build process. The implementation is solid, introducing a maximize_crashkernel flag to control platform-specific checks and adding a new is_osbuild helper to detect the build environment. The addition of suggest_crashkernel_reset to inform users about the abundant memory on deployed systems is a thoughtful touch. The code is well-structured, and I have one suggestion to improve performance by replacing external process calls with shell built-ins.

kdumpctl Outdated
_low_mem=$(get_crashkernel_low_value)
# The crashkernel formula does not include the crashkernel low value
# but the kexec_crash_size does.
_recommend_value=$(memsize_add "$_recommend_value" "$_low_mem")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! Combined with a few experiments done on crashkernel, now I see what you mean. But get_crashkernel_low_value and the comment "The crashkernel formula does not include the crashkernel low value" are still confusing. Maybe you can change the function name to improve the code readability.

Btw, I'm curious to ask when will the case where low memory is not enough happen? For aarch64, in my experiment, even crashkernel=2G won't use the high memory,

[root@rhel10 ~]# kdumpctl showmem
kdump: Reserved 2048MB memory for crash kernel
[root@rhel10 ~]# cat /proc/cmdline 
root=UUID=3e69ec88-a92a-4de9-a000-e394a3f7d3ea console=tty0 console=ttyS0,115200n8 no_timer_check systemd.firstboot=off crashkernel=2G
[root@rhel10 ~]# grep Crash /proc/iomem 
  80000000-ffffffff : Crash kernel

Another question is I don't understand why crashkernel=512M,low fails but crashkernel=512M works or is it a known bug?

[root@rhel10 ~]# uname -r
6.12.0-30.el10.aarch64


[root@rhel10 ~]# grep Crash /proc/iomem 
[root@rhel10 ~]# cat /proc/cmdline 
root=UUID=3e69ec88-a92a-4de9-a000-e394a3f7d3ea console=tty0 console=ttyS0,115200n8 no_timer_check systemd.firstboot=off crashkernel=512M,low
[root@rhel10 ~]# kdumpctl showmem
kdump: Reserved 0MB memory for crash kernel

[root@rhel10 ~]# cat /proc/cmdline 
root=UUID=3e69ec88-a92a-4de9-a000-e394a3f7d3ea console=tty0 console=ttyS0,115200n8 no_timer_check systemd.firstboot=off crashkernel=512M
[root@rhel10 ~]# grep Crash /proc/iomem 
  e0000000-ffffffff : Crash kernel
[root@rhel10 ~]# kdumpctl showmem
kdump: Reserved 512MB memory for crash kernel

@pfliu
Copy link
Collaborator Author

pfliu commented Sep 4, 2025

Hi Coiby

For the first question: where low memory is not enough happen?
Before crashkernel= is handled by the kernel, there are several subsystem has been initialized. If they occupy too many memory below 4G, there will be no enough memory below 4G left for crashkernel=.
Also, there are some machines with sparse physical memory below 4G. It is short of memory below 4GB.

For the second question why crashkernel=512M,low fails but crashkernel=512M works or is it a known bug?
It is not a bug. crashkernel=512M,low will only try its luck under 4G. On the other hand, crashkernel=512M will try it luck under 4G in the first place, if it fails, it will try its luck from memory address greater than 4G. In the latter case, it only reserves additional 128MB below 4GB for bounce-DMA.

I will improve the notes around the get_crashkernel_low_value().

Besides that, I just realize that this solution should skip the 2G-4G:256M. Because in this case, it usually a guest without smmu or mlx netcard. If we reserve large chunk of memory (1G) for crashkernel, it may cause OOM in the first kernel

Appreciate for your review!

@coiby
Copy link
Member

coiby commented Sep 5, 2025

Hi Coiby

For the first question: where low memory is not enough happen? Before crashkernel= is handled by the kernel, there are several subsystem has been initialized. If they occupy too many memory below 4G, there will be no enough memory below 4G left for crashkernel=. Also, there are some machines with sparse physical memory below 4G. It is short of memory below 4GB.

Thanks for your answer!

For the second question why crashkernel=512M,low fails but crashkernel=512M works or is it a known bug? It is not a bug. crashkernel=512M,low will only try its luck under 4G. On the other hand, crashkernel=512M will try it luck under 4G in the first place, if it fails, it will try its luck from memory address greater than 4G. In the latter case, it only reserves additional 128MB below 4GB for bounce-DMA.

But if you check my pasted logs, for crashkernel=512M, the allocated memory for crash kernel is e0000000-ffffffff which belongs to the low memory. Besides only one line matches "Crash kernel" so there is no additional 128MB below 4GB.

I will improve the notes around the get_crashkernel_low_value().

Besides that, I just realize that this solution should skip the 2G-4G:256M. Because in this case, it usually a guest without smmu or mlx netcard. If we reserve large chunk of memory (1G) for crashkernel, it may cause OOM in the first kernel

Appreciate for your review!

My pleasure! Thanks for addressing my feedback!

@pfliu pfliu force-pushed the fix_osbuild branch 2 times, most recently from c014ca1 to a3bcf79 Compare September 23, 2025 07:06
@pfliu
Copy link
Collaborator Author

pfliu commented Sep 24, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to maximize the reserved crashkernel size for OSTree-based deployments to prevent OOM issues. This is achieved by introducing a maximize_crashkernel flag that bypasses platform-specific memory deduction heuristics. The changes are generally well-structured, introducing helper functions and centralizing logic. However, I've identified a critical issue in kdumpctl where a newly modified function is used incorrectly, which will cause the script to fail. I've also found a potential logic issue in kdump-lib.sh related to the same new functionality, which could lead to insufficient memory reservation on some systems.

kdumpctl Outdated
# unlike kexec_crash_size which does. Systems using DMA-bounce buffers
# should factor this into the calculation.
_bounce_buff_size=$(get_crashkernel_bounce_buffer_size)
_recommend_value=$(_crashkernel_add "$_recommend_value" "$_bounce_buff_size" 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The function _crashkernel_add is being used incorrectly here. _recommend_value holds a single memory size (e.g., "256M"), not a full crashkernel string with multiple ranges. Calling _crashkernel_add with skip=1 on a single value will cause it to return an empty string, leading to a script error in the following lines when to_bytes is called on an empty value. To simply add the _bounce_buff_size to _recommend_value, you should use the memsize_add function directly.

Suggested change
_recommend_value=$(_crashkernel_add "$_recommend_value" "$_bounce_buff_size" 1)
_recommend_value=$(memsize_add "$_recommend_value" "$_bounce_buff_size")

kdump-lib.sh Outdated
fi
echo -n "$(_crashkernel_add "$_ck_cmdline" "${_delta}M")"
echo -n "$(_crashkernel_add "$_ck_cmdline" "${_delta}M" 1)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The _crashkernel_add function is called with skip=1, which prevents the memory _delta from being added to the first memory range in _ck_cmdline (e.g., "2G-4G"). This seems incorrect, as features like SME/SEV (which contribute to _delta) can be active on systems with less than 4GB of RAM, and they would still require the extra memory for kdump to operate reliably. This could lead to an underestimation of required memory and potential OOM issues during a crash dump on those systems. It's recommended to apply the delta to all memory ranges by using skip=0 (or omitting the argument, as 0 is the default).

Suggested change
echo -n "$(_crashkernel_add "$_ck_cmdline" "${_delta}M" 1)"
echo -n "$(_crashkernel_add \"$_ck_cmdline\" \"${_delta}M\")"

@pfliu
Copy link
Collaborator Author

pfliu commented Sep 26, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the underestimation of crashkernel memory in OSTree environments by maximizing the reserved size. The introduction of the maximize_crashkernel flag and its application to bypass platform detection is a solid approach. The refactoring of shared functions into kdump-lib.sh improves code organization. Additionally, the new warning for over-provisioned memory is a helpful feature for users. My review includes a couple of minor suggestions to improve efficiency and the user-friendliness of a warning message.

@coiby
Copy link
Member

coiby commented Oct 15, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to maximize the reserved crashkernel size in osbuild environments to prevent out-of-memory issues. This is achieved by adding a maximize_crashkernel flag that bypasses platform-specific memory deduction heuristics. The changes are well-implemented, with logic correctly refactored into kdump-lib.sh and new functionality covered by tests. A helpful warning is also added to guide users in releasing any excess reserved memory after boot. My review includes a couple of suggestions to improve code efficiency and user-friendliness in the newly added functions.

Comment on lines +1145 to +1227
start_hex=$(echo "$first_line" | awk '{split($1, a, "-"); print a[1]}')
end_hex=$(echo "$first_line" | awk '{split($1, a, "-"); print a[2]}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These two lines invoke awk twice on the same input ($first_line) to extract the start and end addresses. This is inefficient. You can achieve the same result with a single awk command and read, which is cleaner and more performant.

Suggested change
start_hex=$(echo "$first_line" | awk '{split($1, a, "-"); print a[1]}')
end_hex=$(echo "$first_line" | awk '{split($1, a, "-"); print a[2]}')
read -r start_hex end_hex < <(echo "$first_line" | awk '{split($1, a, "-"); print a[1], a[2]}')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pfliu this suggestion from Gemini seems to make sense, what do you think?

@coiby
Copy link
Member

coiby commented Oct 30, 2025

Hi @pfliu

Thanks for pushing a new version. The PR LGTM! Once you fix the unit-tests and fortmat-check failures, we are good to go!

@pfliu pfliu force-pushed the fix_osbuild branch 2 times, most recently from 1c0583b to 1220a58 Compare November 21, 2025 06:56
Pingfan Liu added 7 commits November 21, 2025 16:00
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Currently, crashkernel_add() adjusts each range in the 'crashkernel='
command line by adding a delta value. This adjustment is necessary
because certain hardware features require significant additional memory
(up to 1GB on some ARM64 machines).

While this approach works well for traditional installations, OSTree
deployments present a challenge: kdump cannot detect the hardware
environment on the target platform. One solution is to maximize the
memory reserved for the kdump kernel.  However, while this increased
allocation ensures kdump functionality, it raises the risk of OOM
conditions in the production kernel, particularly on small-memory
machines that typically lack such hardware features (Mellanox network
cards, SMMU, SME).

This patch extends crashkernel_add() with a parameter
to skip the first N ranges in the 'crashkernel=' command line. Later, we
can use it to skip the first range.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Signed-off-by: Pingfan Liu <piliu@redhat.com>
This increased "crashkernel=" raises the risk of OOM conditions in the
production kernel, particularly on small-memory machines that typically
lack such hardware features (Mellanox network cards, SMMU). So we
skip that case on aarch64. For other arches, the above factors are not
considered, so ignore them for the time being.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
_is_osbuild() will be used in kdump-lib.sh, so moving it from kdumpctl
to kdump-lib.sh

Signed-off-by: Pingfan Liu <piliu@redhat.com>
At present, the deduction of the proper crashkernel value depends on
detecting platform details. However, more and more platforms are now
deployed using OSTree images, which means kdump-utils cannot retrieve
complete platform information. This often leads to an underestimation of
the memory required by the kdump kernel, increasing the risk of
out-of-memory (OOM) issues.

To mitigate this situation, we choose to maximize the reserved
crashkernel size in the osbuild case, and recommend users to update the
kernel command line and release the excess reserved memory after reboot
(addressed in the next patch).

Signed-off-by: Pingfan Liu <piliu@redhat.com>
The crashkernel value is maximized for a OSTree image. When the image
runs on the final platform, kdump-utils has the chance to get the proper
reserved memory size and prompts the user to update the crashkernel
parameter.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants