Skip to content

feat: implement dynamic batch sizing for ML validation#816

Open
khresth wants to merge 7 commits intoSamsung:mainfrom
khresth:main
Open

feat: implement dynamic batch sizing for ML validation#816
khresth wants to merge 7 commits intoSamsung:mainfrom
khresth:main

Conversation

@khresth
Copy link

@khresth khresth commented Feb 9, 2026

Description

Implemented dynamic batch sizing for ML validation to optimize memory usage and performance across different system configurations. The feature automatically adjusts batch sizes based on available memory, preventing OOM errors while maximizing throughput.

Changes Made:

  • Add MemoryMonitor class for real-time memory tracking using psutil
  • Enhance MlValidator with adaptive batch size calculation (8-512 candidates)
  • Add memory pressure detection and automatic batch size adjustment
  • Implement garbage collection optimization when memory pressure is detected
  • Add comprehensive test coverage for all components (100% pass rate)
  • Update requirements.txt with psutil dependency

This improves performance by maximizing throughput on systems with abundant memory while preventing OOM errors on constrained systems.

How has this been tested?

  • UnitTest - Created comprehensive test suites:
    • test_memory_monitor.py - Memory tracking and pressure detection
    • test_dynamic_batching.py - Batch size calculation and adaptation
    • test_integration.py - End-to-end functionality
    • All tests pass with 100% success rate
  • Memory Stress Testing - Verified behavior under various memory conditions
  • Backward Compatibility - Ensured existing fixed batch sizing still works

- Add MemoryMonitor class for real-time memory tracking
- Enhance MlValidator with adaptive batch size calculation
- Add memory pressure detection and automatic adjustment
- Implement garbage collection optimization
- Add comprehensive test coverage for all components
- Update requirements.txt with psutil dependency

This improves performance by maximizing throughput on systems with
abundant memory while preventing OOM errors on constrained systems.
@khresth khresth requested a review from a team as a code owner February 9, 2026 16:02
Copy link
Contributor

@babenek babenek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @khresth
thank you for the good idea - limit the batch size based on memory consumption.
Unfortunately it useless in some cases - memory limitation by supervisor (it kills a process with an overhead - no any evaluations or garbage collections).
IMHO, psutil is useless because memory allocation for batch is predictable (with some deviations) - it can be estimated even with tests.
Tests: the correct test may be done in TestApp class with subprocess launch and memory limitation. Negative and positive test cases.
Example:

$ ulimit -v 1000000
$ python -m credsweeper --path .
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: std::bad_alloc
$ python -m credsweeper --path . --ml_batch_size 4
Detected Credentials: 4362
Time Elapsed: 34.02959942817688s

The default batch size is ok for 2G limit. Please, provide more details which issue you solved. Performance may be unstable also with odd/even batch size (CPU/GPU threads limitation). BTW, GPU case may have another limitations...

So, my proposal is to add precalculated minimal size of memory for a batch size and print them in --help (ml_engine may allocate unpredictable size). Otherwise, the tool should be used a bit different if the memory limitation exists.

Your tests should pass in fork action first (main branch launches some of them with push)

Updated beautifulsoup4 to version 4.14.3 and added striprtf version 0.0.29. Removed psutil version 6.1.0.
Added memory usage estimates and methods for batch size handling.
 Reverted to original batch size usage
Enhanced help text with memory info
Implement unit tests for memory limits in ML validation.
@khresth khresth requested a review from babenek February 10, 2026 09:51
Copy link
Contributor

@babenek babenek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. --help iprovements only (python gc is useless) - define limits
  2. tests
  3. CI in your fork first

if memory_mb < 1000:
info_lines.append(f" Batch size {batch_size:3d}: ~{memory_mb}MB")
else:
info_lines.append(f" Batch size {batch_size:3d}: ~{memory_mb//1000}GB")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAM uses x^2 measurement

except subprocess.TimeoutExpired:
self.skipTest("ML validation timed out")
except Exception as e:
self.skipTest(f"Could not run ML validation: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be at least 2 type of tests:
def test_xxx_n(self...): - negative. Discover CredSweeper failure for memory consumption in a limit and huge ml_batch
def test_xxx_p(self...): - positive. Discover the failure has gone when batch reduced adn/or memory limit increased
The limitations should related with help info.


def test_low_memory_batch_size(self):
"""Test that small batch size works under memory constraints"""
test_file = self.test_dir / "memory_test_data.txt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temporally files must be created in temporally directory


def force_garbage_collection(self) -> float:
before_mb = self.get_memory_info().process_mb
gc.collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose, the python gc is not helpful when onnxruntime allocated memory in native code. class MemoryMonitor is not necessary. There are only recommendation of memory consumption in help and the tests valuable.

parser.add_argument("--ml_batch_size",
"-b",
help="batch size for model inference (default: 16)",
help=f"batch size for model inference (default: 16)\n\n{MlValidator.get_memory_info_text()}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let use a constant for minimal required memory for the default batch size. The constant must be used in tests for memory limitation. Try subprocess+resource (the tests may be skipped for Windows)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants