Skip to content

Commit 39479f4

Browse files
committed
feat(v1.1.0): Add HTML output and improve sequence matching algorithm
- Add HTML output support for lightweight browser-viewable results - Completely overhaul the DNA sequence marking algorithm - Implement cross-line pattern matching to detect sequences spanning multiple lines - Fix bugs with space handling in the spaced matching mode - Improve code documentation with detailed function docstrings - Refactor variable names for better consistency with Python coding standards This release significantly improves pattern detection accuracy and adds an alternative output format for users without Microsoft Word.
1 parent b407ceb commit 39479f4

File tree

3 files changed

+592
-184
lines changed

3 files changed

+592
-184
lines changed

.gitignore

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,16 @@
1+
# editor
12
.idea
23
.vscode
34

4-
.sequencehash
5+
# requirements
56
clustal-omega-1.2.2-win64
67
python-3.13.2-embed-amd64
7-
Bio.zip
8+
start.bat
9+
10+
# output
11+
.sequencehash
812
sequences.aln
913
sequences.fasta
10-
sequences.docx
11-
start.bat
1214
sequences.json
15+
sequences.docx
16+
sequences.html

README.md

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
1-
# BioAlign - DNA Sequence Alignment Tool
1+
# BioAlign - DNA Sequence Alignment and Marking Tool
22

3-
BioAlign is a user-friendly tool for DNA sequence alignment and visualization. It uses Clustal Omega for alignment and creates nicely formatted Word documents with customizable sequence highlighting.
3+
BioAlign is a user-friendly tool for DNA sequence alignment, marking and visualization. It uses Clustal Omega for alignment and creates nicely formatted HTML and Word documents with sequence highlighting.
44

55
## Table of Contents
6+
67
- [Features](#features)
78
- [Installation](#installation)
89
- [Usage](#usage)
@@ -27,8 +28,9 @@ BioAlign is a user-friendly tool for DNA sequence alignment and visualization. I
2728
- Automatic DNA sequence alignment using Clustal Omega
2829
- Triplet notation formatting for better readability
2930
- Search and highlight specific DNA sequences in the alignment
30-
- Option for spaced or exact sequence matching
31-
- Different highlight colors for each sequence (optional)
31+
- Advanced pattern matching with both exact and space-ignoring modes
32+
- Cross-line sequence matching (finds patterns spanning multiple sequence lines)
33+
- Output to both HTML and Word documents with highlighted matches
3234
- Caching of alignment results for unchanged sequences
3335

3436
## Installation
@@ -39,6 +41,7 @@ No installation required! The release zip file contains everything you need:
3941
2. Run `start.bat` to launch the application
4042

4143
The package includes:
44+
4245
- Embedded Python 3.13.2 runtime
4346
- Clustal Omega 1.2.2 executable
4447
- All required Python dependencies
@@ -58,6 +61,7 @@ Create or edit the `sequences.json` file in the application folder. This file sh
5861
```
5962

6063
Where:
64+
6165
- Each key is the sequence name
6266
- Each value is the DNA sequence
6367
- You can add as many sequences as needed
@@ -76,26 +80,32 @@ When prompted:
7680

7781
1. Enter a DNA sequence to search for (e.g., "CTG") or leave empty to disable highlighting
7882
2. Choose search mode:
79-
- `exact`: Matches only exact sequences without spaces
80-
- `spaced`: Matches sequences allowing for spaces between nucleotides
81-
3. Choose whether to use separate colors for each sequence (yes/no)
83+
- `exact`: Matches only exact sequences including spaces
84+
- `spaced`: Ignores spaces during matching, useful for finding patterns across triplet notation
85+
86+
The improved spaced mode can now correctly identify patterns that span across the triplet spaces and sequence lines in the formatted output.
8287

8388
### Output
8489

8590
The program generates:
91+
8692
- `sequences.fasta`: The input file for Clustal Omega
8793
- `sequences.aln`: The alignment result from Clustal Omega
88-
- `sequences.docx`: The final Word document with formatted alignment and highlighting
94+
- `sequences.html`: HTML document with formatted alignment and highlighting
95+
- `sequences.docx`: Word document with formatted alignment and highlighting
96+
97+
The HTML output provides a lightweight, browser-viewable alternative that doesn't require Microsoft Word to open.
8998

9099
## Example
91100

92-
For the provided example sequences, searching for "CTG" with spaced mode enabled will highlight this pattern in all sequences, allowing you to easily compare variations.
101+
For the provided example sequences, searching for "CTG" with spaced mode enabled will highlight this pattern in all sequences, allowing you to easily compare variations. The improved algorithm will now correctly find instances even when they span across the spaces in triplet notation.
93102

94103
## Notes
95104

96105
- The tool caches alignment results to avoid redundant calculations
97106
- The Word document uses Courier New font for consistent spacing
98-
- Highlighting uses yellow by default, or green/yellow/pink when using separate colors
107+
- The HTML output uses the same monospace formatting for consistency with the Word document
108+
- The improved marking algorithm can now detect patterns that span across multiple lines of the same sequence
99109

100110
## Developer Setup
101111

@@ -110,35 +120,40 @@ pip install biopython python-docx
110120
### Installing Clustal Omega
111121

112122
1. **Windows**:
113-
- Download Clustal Omega from http://www.clustal.org/omega/
123+
- Download Clustal Omega from <http://www.clustal.org/omega/>
114124
- Extract the files to a directory named `clustal-omega-1.2.2-win64` in the same folder as main.py
115125
- Ensure that `clustalo.exe` is directly inside this directory
116126

117127
2. **macOS**:
118128
- Install via Homebrew: `brew install clustal-omega`
119129
- Modify the path in the prepare_seq function where Clustal Omega is called to use:
130+
120131
```python
121132
command_return = subprocess.run(f"clustalo --infile {input_file_name} --outfile {output_file_name} --outfmt clustal --force", shell=True)
122133
```
123134

124135
3. **Linux**:
125136
- Install via package manager: `sudo apt install clustal-omega` (Ubuntu/Debian) or equivalent
126137
- Modify the path in the prepare_seq function where Clustal Omega is called to use:
138+
127139
```python
128140
command_return = subprocess.run(f"clustalo --infile {input_file_name} --outfile {output_file_name} --outfmt clustal --force", shell=True)
129141
```
130142

131143
### Required Files Structure
132144

133145
Create the following files in your working directory:
146+
134147
- `main.py` - The main program
135148
- `sequences.json` - Your DNA sequences in JSON format
149+
- `start.bat` (optional) - For easy launching on Windows
136150

137151
## Requirements
138152

139153
For end users, the release package is self-contained and works on Windows systems without additional installations.
140154

141155
For developers working with just the source code:
156+
142157
- Python 3.6 or higher
143158
- Biopython library
144159
- python-docx library

0 commit comments

Comments
 (0)