Skip to content

Commit ebaa196

Browse files
committed
update paper and change name
1 parent c62f0ae commit ebaa196

File tree

7 files changed

+64
-67
lines changed

7 files changed

+64
-67
lines changed

Cargo.lock

Lines changed: 20 additions & 20 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
[package]
2-
name = "fastchaos"
2+
name = "chaoscoder"
33
version = "0.1.0"
44
authors = ["Anicet Ebou <anicet.ebou@gmail.com>"]
55
edition = "2021"
6-
description = "fastchaos encode, decode and analyze DNA sequence using integer Chaos Game Representation"
7-
homepage = "https://github.com/Ebedthan/fastchaos"
8-
repository = "https://github.com/Ebedthan/fastchaos"
6+
description = "chaoscoder encode, decode and analyze DNA sequence using integer Chaos Game Representation"
7+
homepage = "https://github.com/Ebedthan/chaoscoder"
8+
repository = "https://github.com/Ebedthan/chaoscoder"
99
readme = "README.md"
1010
license = "MIT"
1111
categories = ["command-line-utilities"]
@@ -42,6 +42,6 @@ inherits = "release"
4242
lto = "thin"
4343

4444
[[bin]]
45-
name = "fastchaos"
45+
name = "chaoscoder"
4646
path = "src/main.rs"
4747
bench = false

README.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,44 @@
1-
# fastchaos
1+
# chaoscoder
22

33
*DNA Sequence encoding, decoding and analysis using (Integer) Chaos Game Representation*
44

5-
[![Continuous Integration](https://github.com/Ebedthan/fastchaos/actions/workflows/ci.yml/badge.svg)](https://github.com/Ebedthan/fastchaos/actions/workflows/ci.yml)
6-
[![codecov](https://codecov.io/gh/Ebedthan/fastchaos/branch/main/graph/badge.svg?token=K7VN5TH6EZ)](https://codecov.io/gh/Ebedthan/fastchaos)
7-
<a href="https://github.com/Ebedthan/fastchaos/blob/master/LICENSE">
5+
[![Continuous Integration](https://github.com/Ebedthan/chaoscoder/actions/workflows/ci.yml/badge.svg)](https://github.com/Ebedthan/chaoscoder/actions/workflows/ci.yml)
6+
[![codecov](https://codecov.io/gh/Ebedthan/chaoscoder/branch/main/graph/badge.svg?token=K7VN5TH6EZ)](https://codecov.io/gh/Ebedthan/chaoscoder)
7+
<a href="https://github.com/Ebedthan/chaoscoder/blob/master/LICENSE">
88
<img src="https://img.shields.io/badge/license-MIT-blue?style=flat">
99
</a>
1010
<br/>
1111

1212
## 🗺️ Overview
13-
`fastchaos` implement [integer chaos game representation (iCGR) algorithm](https://www.liebertpub.com/doi/abs/10.1089/cmb.2018.0173) for DNA sequence encoding and decoding. `fastchaos` is the first complete implementation of the algorithm in a bioinformatic tool aiming at users. It also add to the original algorithm a output file format which is a `zst` compressed JSON file containing the 3 integers of 100bp subsequences of the supplied sequence. This allow fast encoding and decoding.
13+
`chaoscoder` implement [integer chaos game representation (iCGR) algorithm](https://www.liebertpub.com/doi/abs/10.1089/cmb.2018.0173) for DNA sequence encoding and decoding. `chaoscoder` is the first complete implementation of the algorithm in a bioinformatic tool aiming at users. It also add to the original algorithm a output file format which is a `zst` compressed JSON file containing the 3 integers of 100bp subsequences of the supplied sequence. This allow fast encoding and decoding.
1414

15-
`fastchaos` also implements [chaos game representation (CGR) of DNA sequence](https://academic.oup.com/nar/article-abstract/18/8/2163/2383530) in a fast tool that draw the representation of a sequence and can compare the CGR image using the [DSSIM algorithm](https://github.com/kornelski/dssim/).
15+
`chaoscoder` also implements [chaos game representation (CGR) of DNA sequence](https://academic.oup.com/nar/article-abstract/18/8/2163/2383530) in a fast tool that draw the representation of a sequence and can compare the CGR image using the [DSSIM algorithm](https://github.com/kornelski/dssim/).
1616

1717
## Installation
1818

1919
```bash
20-
git clone https://github.com/Ebedthan/fastchaos.git
21-
cd fastchaos
20+
git clone https://github.com/Ebedthan/chaoscoder.git
21+
cd chaoscoder
2222
cargo build --release
2323
```
2424

2525
## User guide
2626

2727
```bash
2828
# Encoding DNA sequence into integer chaos game representation
29-
fastchaos encode seq.fa
29+
chaoscoder encode seq.fa
3030

3131
# Decoding integer chaos game representation into DNA sequence
32-
fastchaos decode seq.icgr
32+
chaoscoder decode seq.icgr
3333

3434
# Draw chaos game representation of DNA sequence
35-
fastchaos draw seq.fa
35+
chaoscoder draw seq.fa
3636

3737
# Compare multiple chaos game representation image using DSSIM
38-
fastchaos compare images_dir
38+
chaoscoder compare images_dir
3939
```
4040

41-
For full details, do `fastchaos -h`.
41+
For full details, do `chaoscoder -h`.
4242

4343
### Requirements
4444
- [Rust](https://rust-lang.org) in stable channel
@@ -49,7 +49,7 @@ This crate's minimum supported `rustc` version is `1.82.0`.
4949

5050

5151
### Bugs
52-
Submit problems or requests to the [Issue Tracker](https://github.com/Ebedthan/fastchaos/issues).
52+
Submit problems or requests to the [Issue Tracker](https://github.com/Ebedthan/chaoscoder/issues).
5353

5454

5555
### License

dist-workspace.toml

Lines changed: 0 additions & 13 deletions
This file was deleted.

paper.md

Lines changed: 23 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: 'fastchaos: block-based integer chaos game representation encoding and decoding of DNA sequences'
2+
title: 'chaoscoder: block-based integer chaos game representation encoding and decoding of DNA sequences'
33
tags:
44
- DNA sequence analysis
55
- Chaos game representation
@@ -13,42 +13,52 @@ authors:
1313
orcid: 0000-0002-9078-8844
1414
affiliation: 1
1515
affiliations:
16-
- name: Equipe Bioinformatique et Biostatistique, Laboratoire de Microbiologie, Biotechnologie et Bioinformatique, Institut National Polytechnique Félix Houphouët-Boigny, Côte d'Ivoire
16+
- name: Equipe Bioinformatique et Biostatistique, Laboratoire de Microbiologie, Biotechnologie et Bioinformatique, Institut National Polytechnique Félix Houphouët-Boigny, BP 1093 Yamoussoukro, Côte d'Ivoire
1717
index: 1
1818
date: 15 July 2025
1919
bibliography: paper.bib
2020
---
2121

2222
# Summary
2323

24-
Computational analysis of DNA sequences is fundamental in modern bioinformatics, enabling tasks such as classification, genome comparison, mutation detection, and evolutionary studies. To support these analyses, DNA sequences, represented as strings of nucleotide letters (A, T, C, G), must be converted into numerical formats suitable for mathematical operations and machine learning workflows.
24+
Computational analysis of DNA sequences underpins numerous bioinformatics applications, including sequence classification, genome comparison, mutation detection, and evolutionary studies. These tasks often require transforming symbolic nucleotide sequences (A, T, C, G) into numerical representations suitable for mathematical processing or machine learning.
2525

26-
One widely used encoding method is the Chaos Game Representation (CGR), which maps sequences onto a 2D space, revealing compositional and structural patterns [@jeffrey_chaos_1990; @vinga_pattern_2012]. However, CGR relies on floating-point arithmetic, which introduces rounding errors and limits precision-especially problematic for long sequences and exact sequence reconstruction.
26+
Chaos Game Representation (CGR) is a well-established method that encodes DNA sequences as points in a 2D space, revealing motifs and structural patterns [@jeffrey_chaos_1990]. However, traditional CGR depends on floating-point arithmetic, leading to rounding errors and imprecision—especially when applied to long sequences or tasks that require exact sequence reconstruction.
2727

28-
To address these limitations, our software implements the Integer Chaos Game Representation (iCGR), a mathematically robust alternative that operates entirely in integer space [@yin_encoding_2018]. This guarantees lossless encoding and decoding. Furthermore, we introduce a block-based iCGR algorithm that enables the encoding of long genomic sequences by processing them in overlapping segments. This makes the method scalable and compatible with high-throughput genome analysis.
28+
`chaoscoder` implements the Integer Chaos Game Representation (iCGR), a variant that operates entirely in integer space to provide lossless encoding and decoding [@yin_encoding_2018]. To address the exponential scaling limitation of iCGR, the software introduces a block-based variant that divides sequences into overlapping segments, enabling scalable and parallelizable encoding of genome-length sequences.
2929

30-
The software supports efficient encoding, decoding, and standardized storage of iCGR coordinates. It is designed to be fast, precise, and extensible, making it suitable for a wide range of genomic applications where reliability and performance are essential.
30+
The software provides a command-line interface for encoding, decoding, visualizing CGRs, and comparing sequence structure via image-based SSIM (Structural Similarity Index Measure). It supports standardized storage of encoded data in a custom `.bicgr` file format, designed for efficient downstream use.
31+
32+
Written in Rust for performance and reliability, `chaoscoder` is well-suited for researchers and developers working with large-scale genomic datasets where precision, reversibility, and scalability are essential.
3133

3234
# Implementation
3335

34-
## Encoding and decoding DNA sequences by block-based integer CGR
36+
## Encoding and decoding DNA sequences by integer CGR
37+
38+
`chaoscoder` provides a CLI to encode and decode DNA sequences using the iCGR algorithm proposed by Yin [@yin_encoding_2018]. For sequences shorter than 100 nucleotides, the classic iCGR approach is used, mapping each base to integer coordinates without rounding errors.
39+
40+
## Block-based encoding
3541

36-
To solve the problem of exponential scaling that limits the iCGR method to sequence length of 100 nt, we propose a block-based approach consisting of splitting sequences into fixed-size blocks (e.g. 50-100 nt) to ensure that the computation remain within harware limits. The algorithms first split sequences into overlapping fragments based on input from the user (Figure 1) and then encode subsequences into tri-integers based on the iCGR algorithm defined by Yin [@yin_encoding_2018].
42+
Due to the exponential nature of coordinate growth in iCGR, encoding long sequences (e.g., full genomes) directly is computationally infeasible. To mitigate this, `chaoscoder` implements a block-based iCGR approach. Sequences are partitioned into fixed-size, optionally overlapping segments (e.g., 50100 nt), each of which is independently encoded using the iCGR algorithm (Figure 1).
3743

38-
## The block-based integer chaos game representation file format
44+
The result is a scalable encoding strategy that maintains the reversibility and precision of iCGR while enabling genome-scale processing.
3945

40-
The file structure of a block-based integer chaos game representation file (.bicgr) follows a tab-separated-like format (Figure 2).
46+
## The `.bicgr` file format
47+
48+
The block-based integer Chaos Game Representation (.bicgr) format is a custom tab-separated file structure (Figure 2).
4149

4250
![Example of a .bicgr file](bicgr.png)
4351

44-
The BICGR format specifies three mandatory columns and one optional section. The first section is the sequence ID which is mandatory while the second field is the sequence description which is optional. The first section is the overlap argument used by the encoding algorithm. The triintegers are listed as x, y, and n for each block and are arranged according to the 5 to 3 'orientation' of the DNA strand, as outputted by the encoding algorithm.
52+
It includes the sequence ID (mandatory), the sequence description (optional), the overlap parameter used during encoding and the iCGR tri-integer coordinates (`x`, `y`, and `n`) for each block, listed in 5' to 3' orientation. This structure ensures consistent, interpretable, and easily parsable output for integration into downstream pipelines.
4553

4654
## Other features
4755

48-
`fastchaos` has several other useful features. First, `fastchaos` can draw the traditional CGR of a sequences and compare them with other sequence's CGR by computing the structural similarity index (SSIM) between the two images. This allows for a more nuanced comparison of sequence similarity beyond simple sequence alignment. Second `fastchaos` takes advantage of multithreading to speed up the encoding and decoding process.
56+
`chaoscoder` offers additional functionalities to support exploratory and comparative genomics. First, the software can generate 2D CGR images for encoded sequences. Second, users can compute Structural Similarity Index (SSIM) between CGR images to compare sequence patterns without alignment.
57+
Finally, encoding and decoding tasks are multithreaded to improve performance on large datasets.
58+
4959

5060
# Installation
5161

52-
The `fastchaos` software is programmed in Rust, and is available on GitHub at [https://github.com/Ebedthan/fastchaos](https://github.com/Ebedthan/fastchaos).
62+
`chaoscoder` is written in Rust and distributed via GitHub at [https://github.com/Ebedthan/chaoscoder](https://github.com/Ebedthan/chaoscoder).
5363

5464
# References

src/cli.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ use std::path::PathBuf;
88

99
#[derive(Parser, Debug)]
1010
#[command(
11-
name = "fastchaos",
11+
name = "chaoscoder",
1212
version = "0.1.0",
1313
author = "Anicet Ebou <anicet.ebou@gmail.com>",
1414
about = "Rapid encoding, decoding and analysis of DNA sequences with (Integer) Chaos Game Representation"

src/icgr.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -331,7 +331,7 @@ fn tri_integers_to_dna(tri_integers: TriIntegers) -> Vec<u8> {
331331
/// # Examples
332332
///
333333
/// ```
334-
/// use fastchaos::icgr::IChaos;
334+
/// use chaoscoder::icgr::IChaos;
335335
///
336336
/// let icgr = IChaos::new("seq1", "description", vec![1, 2, 3]);
337337
/// assert_eq!(icgr.id(), "seq1");

0 commit comments

Comments
 (0)