update paper

Ebedthan · Ebedthan · commit fcd01f414c01 · 2025-07-17T11:42:04.000Z
diff --git a/paper.md b/paper.md
@@ -1,5 +1,5 @@
 ---
-title: 'chaoscoder: block-based integer chaos game representation encoding and decoding of DNA sequences'
+title: 'chaoscoder: block-based integer chaos game representation encoding, decoding and analysis of DNA sequences'
 tags:
   - Rust
   - DNA sequence analysis
@@ -25,7 +25,7 @@ Computational analysis of DNA sequences underpins numerous bioinformatics applic
 # Statement of need
 
 The exponential growth of genomic datasets necessitates robust, scalable, and reversible methods for DNA sequence encoding that can support downstream computational workflows. Among existing approaches, CGR has been widely adopted for its ability to visualize and analyze nucleotide composition in a geometric framework [@almeida_analysis_2001]. However, CGR suffers from inherent limitations: it relies on floating-point arithmetic, which introduces precision errors, lacks scalability to large sequences, and is not reversible, making exact sequence reconstruction impossible. The Integer Chaos Game Representation introduced by Yin *et al.* [@yin_encoding_2018], addresses these shortcomings by providing a mathematically rigorous and fully reversible encoding scheme based on integer arithmetic. Despite its theoretical advantages, iCGR remains underutilized due to the absence of a comprehensive, open-source implementation suitable for genome-scale applications. Apart from the illustrative prototype provided by the original authors, no available software supports full encoding, decoding, and standardized storage of iCGR coordinates in a format adapted to large-scale, reproducible workflows.
-This software fills that gap by offering a modular, high-performance implementation of iCGR. It introduces a block-based strategy capable of handling arbitrarily long sequences through segmented and overlapping encoding, ensuring both scalability and reversibility. The toolset includes efficient utilities for encoding, decoding, and storing sequences in a compressed, structured format that is suitable for integration into bioinformatics pipelines. By enabling reproducible, high-throughput analyses, this implementation makes iCGR practically accessible to researchers working in genome classification, alignment-free comparison, compression, evolutionary genomics, and machine learning applications [@chicco_ten_2017]. It combines computational efficiency with mathematical rigor to support exact sequence recovery and interpretable analyses at scale.
+This software fills that gap by offering a modular, high-performance implementation of iCGR. It introduces a block-based strategy capable of handling arbitrarily long sequences through segmented and overlapping encoding, ensuring both scalability and reversibility. `chaoscoder` includes efficient utilities for encoding, decoding, and storing sequences in a structured format that is suitable for integration into bioinformatics pipelines [@chicco_ten_2017].
 
 # Implementation
 
@@ -52,10 +52,7 @@ It includes the sequence ID (mandatory), the sequence description (optional), th
 ## Other features
 
 `chaoscoder` offers additional functionalities to support exploratory and comparative genomics. First, the software can generate 2D CGR images for encoded sequences. Second, users can compute Structural Similarity Index (SSIM) between CGR images to compare sequence patterns without alignment. Finally, encoding and decoding tasks are multithreaded to improve performance on large datasets.
-
-
-# Installation
-
 `chaoscoder` is written in Rust and distributed via GitHub at [https://github.com/Ebedthan/chaoscoder](https://github.com/Ebedthan/chaoscoder).
 
+
 # References