Skip to content

Conversation

@ValentaTomas
Copy link
Member

@ValentaTomas ValentaTomas commented Nov 15, 2025

To gain more control over memfile handling during snapshotting, we modify the snapshot logic to optionally skip dumping the memfile.

Also, it adds info about the memory regions when requesting instance info, so we can copy the memory directly.

This is fully compatible with FC v1.12.

This PR is for viewing code changes only and won't be merged.

@ValentaTomas ValentaTomas added the wontfix This will not be worked on label Nov 15, 2025
@ValentaTomas ValentaTomas changed the title Expose memory mapping; Add optional memfile dump [v1.12] Expose memory mapping; Add optional memfile dump Nov 15, 2025
@ValentaTomas ValentaTomas changed the title [v1.12] Expose memory mapping; Add optional memfile dump [v1.12] Expose memory mapping; Make memfile dump optional Nov 15, 2025
@ValentaTomas
Copy link
Member Author

We might also make the snapfile optional/return it as content instead.

@ValentaTomas ValentaTomas force-pushed the firecracker-v1.12-direct-mem branch from a3608ad to 214264f Compare December 18, 2025 10:07
@ValentaTomas ValentaTomas force-pushed the firecracker-v1.12-direct-mem branch from 214264f to 01f3491 Compare December 18, 2025 10:08
@tomassrnka
Copy link
Member

AI helped review:

  • /memory full scan faults pages inVm::get_memory_info (src/vmm/src/vstate/vm.rs) walks every page and memcmps each “resident” page; if mincore fails it assumes all pages are resident and memcmps them anyway. This can fault nonresident/ballooned pages back in, pull gigabytes into memory, and block the VMM lock for the duration. At minimum, skip memcmp unless mincore says resident, propagate mincore errors, and consider chunking/limits or scanning outside the VMM lock.

  • Long-held VMM lock during scanRuntimeApiController::GetMemory holds the global VMM mutex while calling get_memory_info, so the entire page scan runs under the lock. On large guests this can block other API operations and VM work. Collect mappings under the lock and perform the scan without it, or otherwise limit scan time.

@tomassrnka tomassrnka self-assigned this Dec 18, 2025
@tomassrnka
Copy link
Member

discussed in IRL, looks good to me

Manciukic and others added 5 commits December 19, 2025 07:56
An upstream patch was backported to ubuntu 24.04 6.8.0-58 kernel that
makes the nx hugepages recover thread a child of the firecracker
process, thus increasing process count to 7.
As we're not really interested in knowing how many threads we have in
this test, let's remove the assertion altogether.

Signed-off-by: Riccardo Mancini <mancio@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wontfix This will not be worked on

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants