This project contains scripts, along with data, required to reproduce the figures in the paper titled "Macroscopic Analyses of RNA-Seq Data to Reveal Chromatin Modifications in Aging and Disease" (submitted to eLife). The scripts are designed to run with Python 3.10.7, so ensure you have the correct version installed.
-
data/: Contains data files required for analysis and figure generation. It also contains sub-folders named
- input/: Contains the input files.
- output/: Contains the output generated by the algorithm scripts.
- preprocessed_data/: Contains preprocessed data generated by the preprocessing script.
-
scripts/: Scripts for data processing, analysis, and figure reproduction.
-
README.md: Project documentation.
- Install python3.10.7
brew install python@3.10.7- Install poetry
curl -sSL https://install.python-poetry.org | python3 -
poetry self add poetry-dotenv-plugin- Setup venv
poetry shell
poetry update
poetry installBefore running the algorithms, you need to preprocess the raw data to generate necessary inputs datasets.
-
Ensure the correct dataset is selected in scripts/params_preprocess.yaml.
-
Open and run the scripts/preprocessing.ipynb notebook. This will generate preprocessed data and store it in data/Preprocessed_data/.
-
Open scripts/icgcl/icgcl_script.ipynb.
-
Ensure the correct dataset is selected in scripts/icgcl/params_icgcl.yaml.
-
Run all the cells in the notebook to generate results.
-
Run scripts/icgcl/PostProcessing_icgcl_{active_dataset}.ipynb to format and analyze the results.
-
Open scripts/Cel/cel_script.ipynb.
-
Ensure the correct dataset is selected in scripts/Cel/params_cel.yaml
-
Run all the cells in the notebook to generate results.
-
Run scripts/Cel/postprocessing_cel_{active_dataset}.ipynb to format and analyze the results.
The repository supports multiple datasets, and each dataset configuration is managed through separate YAML configuration files for different scripts. These configuration files allow you to specify dataset-specific parameters and switch between datasets easily.
There are three main configuration files, each serving a different purpose:
- scripts/icgcl/params_icgcl.yaml – Configuration for the ICGCL (Ell Star) algorithm.
- scripts/Cel/params_cel.yaml – Configuration for the CEL algorithms.
- scripts/params_preprocess.yaml – Configuration for the preprocessing script.
Each of these files contains dataset-specific settings such as file paths, experiment lists, and algorithm parameters.
To switch datasets, you need to update the relevant configuration file based on the script you are running. Locate the active_dataset parameter and update it to the desired dataset name.
settings:
active_dataset: "LINE-1" # Change to "LINE-1" or "Fleischer" to switch datasets