Skip to content

Commit fcd01f4

Browse files
committed
update paper
1 parent 4caf930 commit fcd01f4

File tree

1 file changed

+3
-6
lines changed

1 file changed

+3
-6
lines changed

paper.md

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: 'chaoscoder: block-based integer chaos game representation encoding and decoding of DNA sequences'
2+
title: 'chaoscoder: block-based integer chaos game representation encoding, decoding and analysis of DNA sequences'
33
tags:
44
- Rust
55
- DNA sequence analysis
@@ -25,7 +25,7 @@ Computational analysis of DNA sequences underpins numerous bioinformatics applic
2525
# Statement of need
2626

2727
The exponential growth of genomic datasets necessitates robust, scalable, and reversible methods for DNA sequence encoding that can support downstream computational workflows. Among existing approaches, CGR has been widely adopted for its ability to visualize and analyze nucleotide composition in a geometric framework [@almeida_analysis_2001]. However, CGR suffers from inherent limitations: it relies on floating-point arithmetic, which introduces precision errors, lacks scalability to large sequences, and is not reversible, making exact sequence reconstruction impossible. The Integer Chaos Game Representation introduced by Yin *et al.* [@yin_encoding_2018], addresses these shortcomings by providing a mathematically rigorous and fully reversible encoding scheme based on integer arithmetic. Despite its theoretical advantages, iCGR remains underutilized due to the absence of a comprehensive, open-source implementation suitable for genome-scale applications. Apart from the illustrative prototype provided by the original authors, no available software supports full encoding, decoding, and standardized storage of iCGR coordinates in a format adapted to large-scale, reproducible workflows.
28-
This software fills that gap by offering a modular, high-performance implementation of iCGR. It introduces a block-based strategy capable of handling arbitrarily long sequences through segmented and overlapping encoding, ensuring both scalability and reversibility. The toolset includes efficient utilities for encoding, decoding, and storing sequences in a compressed, structured format that is suitable for integration into bioinformatics pipelines. By enabling reproducible, high-throughput analyses, this implementation makes iCGR practically accessible to researchers working in genome classification, alignment-free comparison, compression, evolutionary genomics, and machine learning applications [@chicco_ten_2017]. It combines computational efficiency with mathematical rigor to support exact sequence recovery and interpretable analyses at scale.
28+
This software fills that gap by offering a modular, high-performance implementation of iCGR. It introduces a block-based strategy capable of handling arbitrarily long sequences through segmented and overlapping encoding, ensuring both scalability and reversibility. `chaoscoder` includes efficient utilities for encoding, decoding, and storing sequences in a structured format that is suitable for integration into bioinformatics pipelines [@chicco_ten_2017].
2929

3030
# Implementation
3131

@@ -52,10 +52,7 @@ It includes the sequence ID (mandatory), the sequence description (optional), th
5252
## Other features
5353

5454
`chaoscoder` offers additional functionalities to support exploratory and comparative genomics. First, the software can generate 2D CGR images for encoded sequences. Second, users can compute Structural Similarity Index (SSIM) between CGR images to compare sequence patterns without alignment. Finally, encoding and decoding tasks are multithreaded to improve performance on large datasets.
55-
56-
57-
# Installation
58-
5955
`chaoscoder` is written in Rust and distributed via GitHub at [https://github.com/Ebedthan/chaoscoder](https://github.com/Ebedthan/chaoscoder).
6056

57+
6158
# References

0 commit comments

Comments
 (0)