merPCR - Modern Electronic PCR Implementation

100% Compatible Python reimplementation of me-PCR

📖 Documentation | 🚀 Quick Start | ✅ Verification | 🔄 Migration Guide

Overview

merPCR locates Sequence-Tagged Sites (STS) within genomic sequences using computational PCR. It's a drop-in replacement for the original me-PCR (Multithreaded Electronic PCR), producing identical results while offering modern Python architecture, better error messages, and comprehensive documentation.

Key Highlights:

✅ 100% Compatible - Verified byte-for-byte identical output to me-PCR v1.0.6
🚀 Drop-In Replacement - Same command-line interface, no changes needed
🐍 Python API - Use programmatically in your Python workflows
📚 Well Documented - Extensive guides, examples, and API reference
🧪 Thoroughly Tested - 277 tests with 94% coverage for engine.py on real genomic data

Compatibility & Validation

merPCR has been extensively validated against me-PCR:

Real Genomic Data: Tested on 42 Francisellaceae genomes (90MB)
Compatibility Tests: 15/15 passed (100%)
Output Identity: MD5 checksums match exactly
Critical Fixes: Three algorithmic differences identified and fixed

Three critical compatibility fixes were implemented:

Hash computation - Backward vs forward search
PCR range margins - Proper margin adjustment for size ranges
Forward strand matching - Reverse complement handling for primer2

📄 Full verification details: docs/VERIFICATION.md

Quick Start

Installation

# From PyPI (when available)
pip install merpcr

# From source
git clone https://github.com/FOI-Bioinformatics/merpcr.git
cd merpcr
pip install -e .

Basic Usage

# Use merPCR exactly like me-PCR
merpcr primers.sts genome.fa

# With parameters (both formats supported)
merpcr primers.sts genome.fa -M 50 -N 1 -O results.txt
merpcr primers.sts genome.fa M=50 N=1 O=results.txt  # Legacy format works too!

# Multiple parameters, mixed formats
merpcr primers.sts genome.fa -M 50 -N 1 -T 4 --debug

Python API

from merpcr import MerPCR

# Initialize with parameters
engine = MerPCR(wordsize=11, margin=50, mismatches=1, threads=4)

# Load data and search
engine.load_sts_file("primers.sts")
records = engine.load_fasta_file("genome.fa")
hit_count = engine.search(records, "results.txt")

print(f"Found {hit_count} hits")

📖 More examples: docs/EXAMPLES.md

Key Features

Computational Features

Multithreaded Processing - Automatic thread scaling for large files
Hash-Based Search - O(1) STS lookup with 2-bit encoding
IUPAC Support - Optional ambiguity code handling
Flexible Parameters - Configurable margins, mismatches, and word sizes
3' Protection - Prevents mismatches in primer 3' regions

Software Features

Modern Architecture - Type-safe Python with comprehensive error handling
Better Diagnostics - Clear error messages with context
Debug Mode - Detailed logging for troubleshooting
Extensive Testing - 277 tests covering edge cases and real data
CI/CD Pipeline - Automated testing on multiple platforms

Documentation

For Users

📖 User Guide - Comprehensive usage guide
💡 Examples - Real-world use cases and tutorials
🔄 Migration Guide - Migrating from me-PCR

For Developers

🔧 API Reference - Complete API documentation
✅ Verification Report - Compatibility testing details
⚙️ CI/CD Documentation - GitHub Actions workflows

Quick Reference

Input Format (STS file):

STS_ID	Forward_Primer	Reverse_Primer	PCR_Size	[Optional_Alias]

Output Format:

Sequence_ID	pos1..pos2	STS_ID	Alias	(+/-)

Common Parameters:

-M, --margin - Search margin in bp (default: 50)
-N, --mismatches - Allowed mismatches (default: 0)
-W, --wordsize - Hash word size (default: 11)
-T, --threads - Number of threads (default: 1)
-O, --output - Output file (default: stdout)
--debug - Enable debug logging

📖 Full parameter list: docs/USER_GUIDE.md#parameters

Testing

# Run all tests
make test

# Run specific test categories
make test-unit          # Unit tests only
make test-integration   # Integration tests
make test-performance   # Performance benchmarks

# Generate coverage report
make coverage

# Run compatibility tests
python test_compatibility.py

Current Status:

277 tests (all passing)
94% code coverage for engine.py (critical component)
Real genomic data validation on 42 genomes

Performance

merPCR now matches or exceeds me-PCR performance with Cython optimization:

Dataset Size	me-PCR	merPCR (Pure Python)	merPCR (Cython)	Speedup
Small (<2MB)	~0.5s	~0.5s	~0.4s	2.1x
Medium (2-4MB)	~0.8s	~0.8s	~0.3s	2.6-2.9x
Large (>4MB)	Scales linearly	Scales linearly	Scales linearly (2.9x faster)	2.9x+

Real Genomic Data (Francisellaceae genomes):

F. tularensis (1.8 MB): 0.24s with Cython vs 0.51s pure Python (2.1x speedup)
C. litorale (3.1 MB): 0.30s with Cython vs 0.86s pure Python (2.8x speedup)
F. hongkongensis (2.8 MB): 0.27s with Cython vs 0.80s pure Python (2.9x speedup)

Average speedup: 2.65x faster than pure Python!

🚀 Performance Features:

Automatic Cython optimization (if available)
Seamless fallback to pure Python
Multithreading support for large files
NumPy-accelerated lookup tables

📊 Full performance details: docs/PERFORMANCE.md

Project Structure

merpcr/
├── src/merpcr/          # Main package
│   ├── cli.py           # Command-line interface
│   ├── core/            # Core functionality
│   │   ├── engine.py    # Search engine
│   │   ├── models.py    # Data models
│   │   └── utils.py     # Utilities
│   └── io/              # Input/output
│       ├── fasta.py     # FASTA handling
│       └── sts.py       # STS handling
├── tests/               # Comprehensive test suite (277 tests)
├── docs/                # Documentation
│   ├── USER_GUIDE.md    # Usage documentation
│   ├── API.md           # API reference
│   ├── EXAMPLES.md      # Practical examples
│   ├── VERIFICATION.md  # Compatibility verification
│   ├── MIGRATION.md     # Migration from me-PCR
│   └── CI_CD.md         # CI/CD documentation
├── pyproject.toml       # Modern Python packaging
└── Makefile             # Development commands

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Add tests for new features
Ensure all tests pass (make test)
Format code (make format)
Submit a pull request

Development Setup:

git clone https://github.com/FOI-Bioinformatics/merpcr.git
cd merpcr
make dev-install  # Install with development dependencies
make test         # Verify installation

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Acknowledgments

merPCR builds upon the pioneering work of:

Gregory D. Schuler (NCBI) - Original e-PCR algorithm development
Kevin Murphy (Children's Hospital of Philadelphia) - me-PCR multithreading enhancements

References

Schuler, G.D. (1997) "Sequence mapping by electronic PCR." Genome Research 7: 541-550. doi:10.1101/gr.7.5.541
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990) "Basic local alignment search tool." Journal of Molecular Biology 215: 403-410. doi:10.1016/S0022-2836(05)80360-2

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github		.github
assets/images		assets/images
docs		docs
scripts		scripts
src/merpcr		src/merpcr
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
benchmark_cython.py		benchmark_cython.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
setup.py		setup.py
test_compatibility.py		test_compatibility.py
tox.ini		tox.ini
validate_real_data.py		validate_real_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

merPCR - Modern Electronic PCR Implementation

Overview

Compatibility & Validation

Quick Start

Installation

Basic Usage

Python API

Key Features

Computational Features

Software Features

Documentation

For Users

For Developers

Quick Reference

Testing

Performance

Project Structure

Contributing

License

Acknowledgments

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

License

FOI-Bioinformatics/merpcr

Folders and files

Latest commit

History

Repository files navigation

merPCR - Modern Electronic PCR Implementation

Overview

Compatibility & Validation

Quick Start

Installation

Basic Usage

Python API

Key Features

Computational Features

Software Features

Documentation

For Users

For Developers

Quick Reference

Testing

Performance

Project Structure

Contributing

License

Acknowledgments

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Packages