Check # CPUs and memory size on startup#5128
Check # CPUs and memory size on startup#5128drebelsky wants to merge 1 commit intostellar:masterfrom
Conversation
b32e4b6 to
1b955c0
Compare
|
Looking at the docs, maybe we should be more explicit that we only expect to work on 8 vCPUs and 16 GiB RAM?
|
There was a problem hiding this comment.
Pull request overview
Adds a startup-time system requirements check for mainnet validators, ensuring the host has at least the documented minimum CPU and RAM before allowing stellar-core to start, with config-based override/acknowledgement paths when auto-detection fails or is intentionally ignored.
Changes:
- Expose a Rust bridge function to read host total memory and wire it into C++ startup logic.
- Add new config parameters to handle “unknown system info” defaults and explicit per-node/per-version ignore acknowledgements.
- Enforce minimum specs (16 GiB RAM, 8 vCPUs) during
run()startup for production-network validators.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/rust/src/common.rs |
Adds get_host_total_memory() using sys-info. |
src/rust/src/bridge.rs |
Exposes the new Rust function via the CXX bridge. |
src/rust/Cargo.toml |
Adds sys-info dependency. |
src/main/Config.h |
Introduces new SYSCHECK_* configuration fields. |
src/main/Config.cpp |
Adds parsing for the new SYSCHECK_* config keys. |
src/main/CommandLine.cpp |
Implements and invokes startup validation for RAM/CPU on mainnet validators. |
Cargo.lock |
Locks the new Rust dependency. |
| if (memory < static_cast<uint32_t>(16) * 1024 * 1024) | ||
| { | ||
| if (cfg.SYSCHECK_FORCE_IGNORE_MEMORY != annoyingValue) | ||
| { | ||
| LOG_ERROR( |
There was a problem hiding this comment.
The minimum RAM requirement (16 GiB) is duplicated across the numeric check and multiple log strings. To reduce the chance of inconsistencies if requirements change, consider introducing a named constant (e.g., MIN_VALIDATOR_RAM_KIB) and using it for both the comparison and messaging.
| if (cpus < 8) | ||
| { | ||
| if (cfg.SYSCHECK_FORCE_IGNORE_CPU != annoyingValue) | ||
| { | ||
| LOG_ERROR(DEFAULT_LOG, |
There was a problem hiding this comment.
The minimum CPU requirement (8 vCPUs) is duplicated across the numeric check and log strings. Consider introducing a named constant (e.g., MIN_VALIDATOR_VCPUS) and reusing it in the comparison + messages to avoid drift when requirements change.
| // Config parameters controlling startup system info validation (only | ||
| // relevant for mainnet validators). See the comment above | ||
| // validateSystemInfo in CommandLine.cpp for details. | ||
| uint32_t SYSCHECK_UNKNOWN_MEMORY_DEFAULT{0}; | ||
| uint32_t SYSCHECK_UNKNOWN_CPU_DEFAULT{0}; | ||
| std::string SYSCHECK_FORCE_IGNORE_MEMORY; | ||
| std::string SYSCHECK_FORCE_IGNORE_CPU; |
There was a problem hiding this comment.
Config defaults in this codebase appear to be centralized in Config::Config() (e.g., LOG_FILE_PATH, BUCKET_DIR_PATH, etc. are explicitly initialized in Config.cpp). These new SYSCHECK_* members are the only ones using in-class initializers / relying on default construction, which makes default values harder to audit. Consider initializing all SYSCHECK_* defaults in Config::Config() (and removing the in-class {0} initializers) for consistency and to keep defaults in one place.
| // Config parameters controlling startup system info validation (only | ||
| // relevant for mainnet validators). See the comment above | ||
| // validateSystemInfo in CommandLine.cpp for details. | ||
| uint32_t SYSCHECK_UNKNOWN_MEMORY_DEFAULT{0}; | ||
| uint32_t SYSCHECK_UNKNOWN_CPU_DEFAULT{0}; | ||
| std::string SYSCHECK_FORCE_IGNORE_MEMORY; | ||
| std::string SYSCHECK_FORCE_IGNORE_CPU; |
There was a problem hiding this comment.
New config parameters (SYSCHECK_UNKNOWN_MEMORY_DEFAULT, SYSCHECK_UNKNOWN_CPU_DEFAULT, SYSCHECK_FORCE_IGNORE_*) were added, but the operator-facing config reference file docs/stellar-core_example.cfg doesn’t include them yet. Since this feature can block startup on mainnet validators, please update the example/config documentation so operators can discover these settings outside of log output.
Adds a check that validators on mainnet are running machines that meet our expected minimum specs. Uses rust
sys-infocrate to get the host memory size (note,sys-infois smaller thansysinfo, but also less popular). Usesstd::thread::hardware_concurrencyas the number of vCPUs. Looking at vCPUs is a little odd since, e.g., the aws machines we use seem to have 4 "physical" cores that each support hyperthreading while as far as I can tell, many ARM chips don't support SMT.