Skip to content

altoslabs/papers-2025-rnaseq-chrom-aging

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

This project contains scripts, along with data, required to reproduce the figures in the paper titled "Macroscopic Analyses of RNA-Seq Data to Reveal Chromatin Modifications in Aging and Disease" (submitted to eLife). The scripts are designed to run with Python 3.10.7, so ensure you have the correct version installed.

Project Structure

  • data/: Contains data files required for analysis and figure generation. It also contains sub-folders named

    • input/: Contains the input files.
    • output/: Contains the output generated by the algorithm scripts.
    • preprocessed_data/: Contains preprocessed data generated by the preprocessing script.
  • scripts/: Scripts for data processing, analysis, and figure reproduction.

  • README.md: Project documentation.

Usage instructions

Create Environment

  1. Install python3.10.7
    brew install python@3.10.7
  1. Install poetry
    curl -sSL https://install.python-poetry.org | python3 -
    poetry self add poetry-dotenv-plugin
  1. Setup venv
    poetry shell
    poetry update
    poetry install

Running the Workflow

Preprocessing the Data

Before running the algorithms, you need to preprocess the raw data to generate necessary inputs datasets.

  1. Ensure the correct dataset is selected in scripts/params_preprocess.yaml.

  2. Open and run the scripts/preprocessing.ipynb notebook. This will generate preprocessed data and store it in data/Preprocessed_data/.

Running L-Star (ICGCL) Algorithm

  1. Open scripts/icgcl/icgcl_script.ipynb.

  2. Ensure the correct dataset is selected in scripts/icgcl/params_icgcl.yaml.

  3. Run all the cells in the notebook to generate results.

  4. Run scripts/icgcl/PostProcessing_icgcl_{active_dataset}.ipynb to format and analyze the results.

Running CEL Algorithm

  1. Open scripts/Cel/cel_script.ipynb.

  2. Ensure the correct dataset is selected in scripts/Cel/params_cel.yaml

  3. Run all the cells in the notebook to generate results.

  4. Run scripts/Cel/postprocessing_cel_{active_dataset}.ipynb to format and analyze the results.

Managing Datasets in Different configuration files

The repository supports multiple datasets, and each dataset configuration is managed through separate YAML configuration files for different scripts. These configuration files allow you to specify dataset-specific parameters and switch between datasets easily.

Configuration Files Overview

There are three main configuration files, each serving a different purpose:

  1. scripts/icgcl/params_icgcl.yaml – Configuration for the ICGCL (Ell Star) algorithm.
  2. scripts/Cel/params_cel.yaml – Configuration for the CEL algorithms.
  3. scripts/params_preprocess.yaml – Configuration for the preprocessing script.

Each of these files contains dataset-specific settings such as file paths, experiment lists, and algorithm parameters.

Switching the Active Dataset

To switch datasets, you need to update the relevant configuration file based on the script you are running. Locate the active_dataset parameter and update it to the desired dataset name.

    settings:
        active_dataset: "LINE-1"  # Change to "LINE-1" or "Fleischer" to switch datasets

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published