|
| 1 | +# BioAlign - DNA Sequence Alignment Tool |
| 2 | + |
| 3 | +BioAlign is a user-friendly tool for DNA sequence alignment and visualization. It uses Clustal Omega for alignment and creates nicely formatted Word documents with customizable sequence highlighting. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- Automatic DNA sequence alignment using Clustal Omega |
| 8 | +- Triplet notation formatting for better readability |
| 9 | +- Search and highlight specific DNA sequences in the alignment |
| 10 | +- Option for spaced or exact sequence matching |
| 11 | +- Different highlight colors for each sequence (optional) |
| 12 | +- Caching of alignment results for unchanged sequences |
| 13 | + |
| 14 | +## Installation |
| 15 | + |
| 16 | +No installation required! The release zip file contains everything you need: |
| 17 | + |
| 18 | +1. Unzip the file to any location on your computer |
| 19 | +2. Run `start.bat` to launch the application |
| 20 | + |
| 21 | +The package includes: |
| 22 | +- Embedded Python 3.13.2 runtime |
| 23 | +- Clustal Omega 1.2.2 executable |
| 24 | +- All required Python dependencies |
| 25 | + |
| 26 | +## Usage |
| 27 | + |
| 28 | +### Preparing Your Sequences |
| 29 | + |
| 30 | +Create or edit the `sequences.json` file in the application folder. This file should contain your DNA sequences in the following format: |
| 31 | + |
| 32 | +```json |
| 33 | +{ |
| 34 | + "Sequence1": "ATGCCTGACCTAGTCGATCGATGCTA", |
| 35 | + "Sequence2": "ATGCGTGACCTAGTTGATCGATGCTA", |
| 36 | + "Sequence3": "ATGCCTGACCAAGTCGATCTATGCTA" |
| 37 | +} |
| 38 | +``` |
| 39 | + |
| 40 | +Where: |
| 41 | +- Each key is the sequence name |
| 42 | +- Each value is the DNA sequence |
| 43 | +- You can add as many sequences as needed |
| 44 | + |
| 45 | +### Running the Application |
| 46 | + |
| 47 | +1. Double-click the `start.bat` file |
| 48 | +2. The program will: |
| 49 | + - Load your sequences from sequences.json |
| 50 | + - Perform sequence alignment (or use cached results if unchanged) |
| 51 | + - Prompt you for search options |
| 52 | + |
| 53 | +### Search Options |
| 54 | + |
| 55 | +When prompted: |
| 56 | + |
| 57 | +1. Enter a DNA sequence to search for (e.g., "CTG") or leave empty to disable highlighting |
| 58 | +2. Choose search mode: |
| 59 | + - `exact`: Matches only exact sequences without spaces |
| 60 | + - `spaced`: Matches sequences allowing for spaces between nucleotides |
| 61 | +3. Choose whether to use separate colors for each sequence (yes/no) |
| 62 | + |
| 63 | +### Output |
| 64 | + |
| 65 | +The program generates: |
| 66 | +- `sequences.fasta`: The input file for Clustal Omega |
| 67 | +- `sequences.aln`: The alignment result from Clustal Omega |
| 68 | +- `sequences.docx`: The final Word document with formatted alignment and highlighting |
| 69 | + |
| 70 | +## Example |
| 71 | + |
| 72 | +For the provided example sequences, searching for "CTG" with spaced mode enabled will highlight this pattern in all sequences, allowing you to easily compare variations. |
| 73 | + |
| 74 | +## Notes |
| 75 | + |
| 76 | +- The tool caches alignment results to avoid redundant calculations |
| 77 | +- The Word document uses Courier New font for consistent spacing |
| 78 | +- Highlighting uses yellow by default, or green/blue/pink when using separate colors |
| 79 | + |
| 80 | +## Requirements |
| 81 | + |
| 82 | +This package is self-contained and works on Windows systems without additional installations. |
| 83 | + |
| 84 | +## Acknowledgements |
| 85 | + |
| 86 | +- [Clustal Omega](http://www.clustal.org/omega/) for sequence alignment |
| 87 | +- [Biopython](https://biopython.org/) for biological sequence handling |
| 88 | +- [python-docx](https://python-docx.readthedocs.io/) for Word document generation |
0 commit comments