Skip to content

Commit 8652117

Browse files
author
Jon Palmer
committed
add ability to track memory usage for ab initio #22
1 parent 9a30039 commit 8652117

File tree

8 files changed

+1282
-78
lines changed

8 files changed

+1282
-78
lines changed

CITATION.cff

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
cff-version: 1.2.0
1+
cff-version: version = "25.7.1"
22
title: 'funannotate2: eukaryotic genome annotation'
33
message: >-
44
If you use this software, please cite it using the
@@ -17,5 +17,5 @@ keywords:
1717
- functional annotation
1818
- consensus gene models
1919
license: BSD-2-Clause
20-
version: "25.7.1"
21-
date-released: '2025-07-07'
20+
version: version = "25.7.1"
21+
date-released: '2025-07-14'

MEMORY_MONITORING.md

Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
# Memory Monitoring for Funannotate2 Ab Initio Predictions
2+
3+
This document describes the memory monitoring and prediction system implemented for funannotate2's ab initio gene prediction step.
4+
5+
## Overview
6+
7+
The memory monitoring system provides:
8+
9+
1. **Memory Usage Prediction** - Estimate memory requirements based on contig length
10+
2. **Real-time Memory Monitoring** - Track actual memory usage of subprocess calls
11+
3. **Memory-aware CPU Allocation** - Adjust parallelization based on memory constraints
12+
4. **Memory Usage Reporting** - Generate detailed memory usage reports
13+
14+
## Features
15+
16+
### 1. Memory Prediction Models
17+
18+
The system includes empirical models to predict memory usage for each ab initio tool:
19+
20+
- **SNAP**: Base 50 MB + 0.5 MB per MB of sequence
21+
- **Augustus**: Base 100 MB + 2.0 MB per MB of sequence
22+
- **GlimmerHMM**: Base 30 MB + 0.3 MB per MB of sequence
23+
- **GeneMark**: Base 80 MB + 1.0 MB per MB of sequence
24+
25+
These models provide rough estimates that can be refined with actual usage data.
26+
27+
### 2. Real-time Memory Monitoring
28+
29+
Uses `psutil` to monitor subprocess memory usage in real-time:
30+
31+
- Tracks RSS (Resident Set Size) and VMS (Virtual Memory Size)
32+
- Monitors parent process and all child processes
33+
- Samples memory usage at configurable intervals (default: 100ms)
34+
- Calculates peak, average, and duration statistics
35+
36+
### 3. Memory-aware Scheduling
37+
38+
Automatically adjusts CPU allocation based on:
39+
40+
- Available system memory
41+
- Predicted memory usage per process
42+
- User-specified memory limits
43+
- System memory buffer (20% reserved for OS)
44+
45+
### 4. Integration with Existing Code
46+
47+
The memory monitoring is integrated into:
48+
49+
- `runSubprocess()` - Optional memory monitoring for individual commands
50+
- `abinitio_wrapper()` - Memory prediction and logging per contig
51+
- `runProcessJob()` - Memory-aware CPU allocation for multiprocessing
52+
53+
## Usage
54+
55+
### Command Line Options
56+
57+
Add memory monitoring to funannotate2 predict:
58+
59+
```bash
60+
# Enable memory monitoring
61+
funannotate2 predict -i input_dir --monitor-memory
62+
63+
# Enable memory monitoring with memory limit
64+
funannotate2 predict -i input_dir --monitor-memory --memory-limit 16
65+
```
66+
67+
### CLI Options
68+
69+
- `--monitor-memory`: Enable memory monitoring and prediction
70+
- `--memory-limit GB`: Set memory limit in GB to adjust CPU allocation
71+
72+
### Example Output
73+
74+
When memory monitoring is enabled, you'll see output like:
75+
76+
```
77+
Memory monitoring enabled for ab initio predictions
78+
Memory limit set to 16.0 GB
79+
Memory usage estimate for 150 contigs with tools ['snap', 'augustus']:
80+
Total estimated peak memory: 2847.3 MB
81+
System memory: 14.2 GB available
82+
Processing contig scaffold_1.fasta (length: 2,847,392 bp)
83+
SNAP memory prediction for scaffold_1.fasta: 51.4 MB
84+
Augustus memory prediction for scaffold_1.fasta: 105.4 MB
85+
Memory usage for snap-scaffold_1.fasta:
86+
Process: snap-scaffold_1.fasta
87+
Duration: 12.34 seconds
88+
Peak RSS: 48.2 MB
89+
Peak VMS: 156.7 MB
90+
Average RSS: 42.1 MB
91+
Samples collected: 247
92+
```
93+
94+
## API Reference
95+
96+
### Core Functions
97+
98+
#### `predict_memory_usage(tool_name, contig_length, prediction_data=None)`
99+
100+
Predict memory usage for an ab initio tool based on contig length.
101+
102+
**Parameters:**
103+
- `tool_name`: Name of the ab initio tool ('snap', 'augustus', etc.)
104+
- `contig_length`: Length of the contig in base pairs
105+
- `prediction_data`: Optional historical data for improved predictions
106+
107+
**Returns:** Dictionary with predicted memory usage statistics
108+
109+
#### `MemoryMonitor.monitor_process(process, process_name)`
110+
111+
Monitor memory usage of a subprocess in real-time.
112+
113+
**Parameters:**
114+
- `process`: subprocess.Popen object to monitor
115+
- `process_name`: Name identifier for the process
116+
117+
**Returns:** Dictionary containing memory statistics
118+
119+
#### `estimate_total_memory_usage(contigs, tools, prediction_data=None)`
120+
121+
Estimate total memory usage for running ab initio predictions on multiple contigs.
122+
123+
**Parameters:**
124+
- `contigs`: List of contig file paths
125+
- `tools`: List of ab initio tools to run
126+
- `prediction_data`: Optional historical data
127+
128+
**Returns:** Dictionary with total memory estimates
129+
130+
#### `suggest_cpu_allocation(total_memory_estimate, available_memory_gb, max_cpus)`
131+
132+
Suggest optimal CPU allocation based on memory constraints.
133+
134+
**Parameters:**
135+
- `total_memory_estimate`: Total estimated memory usage in MB
136+
- `available_memory_gb`: Available system memory in GB
137+
- `max_cpus`: Maximum number of CPUs available
138+
139+
**Returns:** Dictionary with CPU allocation suggestions
140+
141+
### Utility Functions
142+
143+
#### `get_system_memory_info()`
144+
145+
Get current system memory information.
146+
147+
**Returns:** Dictionary with system memory statistics
148+
149+
#### `get_contig_length(contig_file)`
150+
151+
Get the length of a contig from a FASTA file.
152+
153+
**Parameters:**
154+
- `contig_file`: Path to the contig FASTA file
155+
156+
**Returns:** Length of the contig in base pairs
157+
158+
#### `format_memory_report(stats)`
159+
160+
Format memory statistics into a human-readable report.
161+
162+
**Parameters:**
163+
- `stats`: Memory statistics dictionary
164+
165+
**Returns:** Formatted string report
166+
167+
## Implementation Details
168+
169+
### Memory Monitoring Process
170+
171+
1. **Prediction Phase**: Before running ab initio tools, estimate memory usage based on contig lengths
172+
2. **System Check**: Assess available system memory and suggest CPU allocation
173+
3. **Real-time Monitoring**: During subprocess execution, sample memory usage at regular intervals
174+
4. **Reporting**: Log memory statistics and generate reports
175+
5. **Model Updates**: Optionally update prediction models with actual usage data
176+
177+
### Memory Sampling
178+
179+
The memory monitor:
180+
- Creates a `psutil.Process` object for the subprocess
181+
- Samples memory usage every 100ms (configurable)
182+
- Tracks both the main process and all child processes
183+
- Handles process termination gracefully
184+
- Calculates statistics from all samples
185+
186+
### CPU Allocation Logic
187+
188+
The system adjusts CPU allocation by:
189+
1. Estimating memory usage per parallel process
190+
2. Calculating how many processes can fit in available memory
191+
3. Leaving a 20% buffer for the operating system
192+
4. Ensuring at least 1 CPU is allocated
193+
5. Not exceeding the user-specified maximum
194+
195+
## Testing
196+
197+
Run the test suite to verify functionality:
198+
199+
```bash
200+
python test_memory_monitoring.py
201+
```
202+
203+
This will test:
204+
- Memory prediction models
205+
- System memory information
206+
- CPU allocation suggestions
207+
- Total memory estimation
208+
- Real-time memory monitoring
209+
210+
## Dependencies
211+
212+
The memory monitoring system requires:
213+
214+
- `psutil` - For system and process memory monitoring
215+
- `json` - For saving/loading memory statistics
216+
- `time` - For timing and sampling
217+
- `threading` - For concurrent memory monitoring
218+
219+
## Future Enhancements
220+
221+
Potential improvements include:
222+
223+
1. **Machine Learning Models** - Use actual usage data to train better prediction models
224+
2. **Memory Profiling** - Detailed analysis of memory allocation patterns
225+
3. **Dynamic Scheduling** - Adjust CPU allocation during runtime based on actual usage
226+
4. **Memory Limits** - Hard memory limits with process termination
227+
5. **Historical Analysis** - Long-term memory usage trends and optimization
228+
6. **Tool-specific Tuning** - Fine-tune memory models for different ab initio tools
229+
230+
## Troubleshooting
231+
232+
### Common Issues
233+
234+
1. **psutil not available**: Install with `pip install psutil`
235+
2. **Permission errors**: Some systems may restrict process monitoring
236+
3. **Inaccurate predictions**: Models are empirical and may need tuning for your data
237+
4. **Memory monitoring overhead**: Monitoring adds small CPU/memory overhead
238+
239+
### Performance Impact
240+
241+
Memory monitoring has minimal performance impact:
242+
- ~1-2% CPU overhead for sampling
243+
- ~1-5 MB memory overhead for the monitor
244+
- Sampling interval can be adjusted to reduce overhead

funannotate2/__main__.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -275,6 +275,17 @@ def predict_subparser(subparsers):
275275
optional_args.add_argument(
276276
"--tmpdir", default="/tmp", help="volume to write tmp files", metavar=""
277277
)
278+
optional_args.add_argument(
279+
"--monitor-memory",
280+
action="store_true",
281+
help="Monitor memory usage of ab initio prediction tools",
282+
)
283+
optional_args.add_argument(
284+
"--memory-limit",
285+
type=float,
286+
help="Memory limit in GB to adjust CPU allocation",
287+
metavar="",
288+
)
278289
other_args = group.add_argument_group("Other arguments")
279290
other_args.add_argument(
280291
"-h",

0 commit comments

Comments
 (0)