100% Compatible Python reimplementation of me-PCR
π Documentation | π Quick Start | β Verification | π Migration Guide
merPCR locates Sequence-Tagged Sites (STS) within genomic sequences using computational PCR. It's a drop-in replacement for the original me-PCR (Multithreaded Electronic PCR), producing identical results while offering modern Python architecture, better error messages, and comprehensive documentation.
Key Highlights:
- β 100% Compatible - Verified byte-for-byte identical output to me-PCR v1.0.6
- π Drop-In Replacement - Same command-line interface, no changes needed
- π Python API - Use programmatically in your Python workflows
- π Well Documented - Extensive guides, examples, and API reference
- π§ͺ Thoroughly Tested - 277 tests with 94% coverage for engine.py on real genomic data
merPCR has been extensively validated against me-PCR:
- Real Genomic Data: Tested on 42 Francisellaceae genomes (90MB)
- Compatibility Tests: 15/15 passed (100%)
- Output Identity: MD5 checksums match exactly
- Critical Fixes: Three algorithmic differences identified and fixed
Three critical compatibility fixes were implemented:
- Hash computation - Backward vs forward search
- PCR range margins - Proper margin adjustment for size ranges
- Forward strand matching - Reverse complement handling for primer2
π Full verification details: docs/VERIFICATION.md
# From PyPI (when available)
pip install merpcr
# From source
git clone https://github.com/FOI-Bioinformatics/merpcr.git
cd merpcr
pip install -e .# Use merPCR exactly like me-PCR
merpcr primers.sts genome.fa
# With parameters (both formats supported)
merpcr primers.sts genome.fa -M 50 -N 1 -O results.txt
merpcr primers.sts genome.fa M=50 N=1 O=results.txt # Legacy format works too!
# Multiple parameters, mixed formats
merpcr primers.sts genome.fa -M 50 -N 1 -T 4 --debugfrom merpcr import MerPCR
# Initialize with parameters
engine = MerPCR(wordsize=11, margin=50, mismatches=1, threads=4)
# Load data and search
engine.load_sts_file("primers.sts")
records = engine.load_fasta_file("genome.fa")
hit_count = engine.search(records, "results.txt")
print(f"Found {hit_count} hits")π More examples: docs/EXAMPLES.md
- Multithreaded Processing - Automatic thread scaling for large files
- Hash-Based Search - O(1) STS lookup with 2-bit encoding
- IUPAC Support - Optional ambiguity code handling
- Flexible Parameters - Configurable margins, mismatches, and word sizes
- 3' Protection - Prevents mismatches in primer 3' regions
- Modern Architecture - Type-safe Python with comprehensive error handling
- Better Diagnostics - Clear error messages with context
- Debug Mode - Detailed logging for troubleshooting
- Extensive Testing - 277 tests covering edge cases and real data
- CI/CD Pipeline - Automated testing on multiple platforms
- π User Guide - Comprehensive usage guide
- π‘ Examples - Real-world use cases and tutorials
- π Migration Guide - Migrating from me-PCR
- π§ API Reference - Complete API documentation
- β Verification Report - Compatibility testing details
- βοΈ CI/CD Documentation - GitHub Actions workflows
Input Format (STS file):
STS_ID Forward_Primer Reverse_Primer PCR_Size [Optional_Alias]
Output Format:
Sequence_ID pos1..pos2 STS_ID Alias (+/-)
Common Parameters:
-M, --margin- Search margin in bp (default: 50)-N, --mismatches- Allowed mismatches (default: 0)-W, --wordsize- Hash word size (default: 11)-T, --threads- Number of threads (default: 1)-O, --output- Output file (default: stdout)--debug- Enable debug logging
π Full parameter list: docs/USER_GUIDE.md#parameters
# Run all tests
make test
# Run specific test categories
make test-unit # Unit tests only
make test-integration # Integration tests
make test-performance # Performance benchmarks
# Generate coverage report
make coverage
# Run compatibility tests
python test_compatibility.pyCurrent Status:
- 277 tests (all passing)
- 94% code coverage for engine.py (critical component)
- Real genomic data validation on 42 genomes
merPCR now matches or exceeds me-PCR performance with Cython optimization:
| Dataset Size | me-PCR | merPCR (Pure Python) | merPCR (Cython) | Speedup |
|---|---|---|---|---|
| Small (<2MB) | ~0.5s | ~0.5s | ~0.4s | 2.1x |
| Medium (2-4MB) | ~0.8s | ~0.8s | ~0.3s | 2.6-2.9x |
| Large (>4MB) | Scales linearly | Scales linearly | Scales linearly (2.9x faster) | 2.9x+ |
Real Genomic Data (Francisellaceae genomes):
- F. tularensis (1.8 MB): 0.24s with Cython vs 0.51s pure Python (2.1x speedup)
- C. litorale (3.1 MB): 0.30s with Cython vs 0.86s pure Python (2.8x speedup)
- F. hongkongensis (2.8 MB): 0.27s with Cython vs 0.80s pure Python (2.9x speedup)
Average speedup: 2.65x faster than pure Python!
π Performance Features:
- Automatic Cython optimization (if available)
- Seamless fallback to pure Python
- Multithreading support for large files
- NumPy-accelerated lookup tables
π Full performance details: docs/PERFORMANCE.md
merpcr/
βββ src/merpcr/ # Main package
β βββ cli.py # Command-line interface
β βββ core/ # Core functionality
β β βββ engine.py # Search engine
β β βββ models.py # Data models
β β βββ utils.py # Utilities
β βββ io/ # Input/output
β βββ fasta.py # FASTA handling
β βββ sts.py # STS handling
βββ tests/ # Comprehensive test suite (277 tests)
βββ docs/ # Documentation
β βββ USER_GUIDE.md # Usage documentation
β βββ API.md # API reference
β βββ EXAMPLES.md # Practical examples
β βββ VERIFICATION.md # Compatibility verification
β βββ MIGRATION.md # Migration from me-PCR
β βββ CI_CD.md # CI/CD documentation
βββ pyproject.toml # Modern Python packaging
βββ Makefile # Development commands
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new features
- Ensure all tests pass (
make test) - Format code (
make format) - Submit a pull request
Development Setup:
git clone https://github.com/FOI-Bioinformatics/merpcr.git
cd merpcr
make dev-install # Install with development dependencies
make test # Verify installationThis project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
merPCR builds upon the pioneering work of:
- Gregory D. Schuler (NCBI) - Original e-PCR algorithm development
- Kevin Murphy (Children's Hospital of Philadelphia) - me-PCR multithreading enhancements
-
Schuler, G.D. (1997) "Sequence mapping by electronic PCR." Genome Research 7: 541-550. doi:10.1101/gr.7.5.541
-
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990) "Basic local alignment search tool." Journal of Molecular Biology 215: 403-410. doi:10.1016/S0022-2836(05)80360-2