Description

This project contains scripts, along with data, required to reproduce the figures in the paper titled "Macroscopic Analyses of RNA-Seq Data to Reveal Chromatin Modifications in Aging and Disease" (submitted to eLife). The scripts are designed to run with Python 3.10.7, so ensure you have the correct version installed.

Project Structure

data/: Contains data files required for analysis and figure generation. It also contains sub-folders named
- input/: Contains the input files.
- output/: Contains the output generated by the algorithm scripts.
- preprocessed_data/: Contains preprocessed data generated by the preprocessing script.
scripts/: Scripts for data processing, analysis, and figure reproduction.
README.md: Project documentation.

Usage instructions

Create Environment

Install python3.10.7

    brew install python@3.10.7

Install poetry

    curl -sSL https://install.python-poetry.org | python3 -
    poetry self add poetry-dotenv-plugin

Setup venv

    poetry shell
    poetry update
    poetry install

Running the Workflow

Preprocessing the Data

Before running the algorithms, you need to preprocess the raw data to generate necessary inputs datasets.

Ensure the correct dataset is selected in scripts/params_preprocess.yaml.
Open and run the scripts/preprocessing.ipynb notebook. This will generate preprocessed data and store it in data/Preprocessed_data/.

Running L-Star (ICGCL) Algorithm

Open scripts/icgcl/icgcl_script.ipynb.
Ensure the correct dataset is selected in scripts/icgcl/params_icgcl.yaml.
Run all the cells in the notebook to generate results.
Run scripts/icgcl/PostProcessing_icgcl_{active_dataset}.ipynb to format and analyze the results.

Running CEL Algorithm

Open scripts/Cel/cel_script.ipynb.
Ensure the correct dataset is selected in scripts/Cel/params_cel.yaml
Run all the cells in the notebook to generate results.
Run scripts/Cel/postprocessing_cel_{active_dataset}.ipynb to format and analyze the results.

Managing Datasets in Different configuration files

The repository supports multiple datasets, and each dataset configuration is managed through separate YAML configuration files for different scripts. These configuration files allow you to specify dataset-specific parameters and switch between datasets easily.

Configuration Files Overview

There are three main configuration files, each serving a different purpose:

scripts/icgcl/params_icgcl.yaml – Configuration for the ICGCL (Ell Star) algorithm.
scripts/Cel/params_cel.yaml – Configuration for the CEL algorithms.
scripts/params_preprocess.yaml – Configuration for the preprocessing script.

Each of these files contains dataset-specific settings such as file paths, experiment lists, and algorithm parameters.

Switching the Active Dataset

To switch datasets, you need to update the relevant configuration file based on the script you are running. Locate the active_dataset parameter and update it to the desired dataset name.

    settings:
        active_dataset: "LINE-1"  # Change to "LINE-1" or "Fleischer" to switch datasets

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Description

Project Structure

Usage instructions

Create Environment

Running the Workflow

Preprocessing the Data

Running L-Star (ICGCL) Algorithm

Running CEL Algorithm

Managing Datasets in Different configuration files

Configuration Files Overview

Switching the Active Dataset

About

Uh oh!

Releases

Packages

Languages

License

altoslabs/papers-2025-rnaseq-chrom-aging

Folders and files

Latest commit

History

Repository files navigation

Description

Project Structure

Usage instructions

Create Environment

Running the Workflow

Preprocessing the Data

Running L-Star (ICGCL) Algorithm

Running CEL Algorithm

Managing Datasets in Different configuration files

Configuration Files Overview

Switching the Active Dataset

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages