Pan-organ poly(A) atlas reveals a post-transcriptional regulatory layer independent of transcription

Overview

This study constructs a comprehensive poly(A) tail atlas by performing unprecedentedly deep, full-length nanopore sequencing across 18 mouse organs. The initial processing of raw data, from FAST5 files to poly(A) tail length measurement, was performed using the FLEP-seq analysis pipeline. This repository contains the subsequent code for downstream analysis and visualization presented in the manuscript, "Pan-organ poly(A) atlas reveals a post-transcriptional regulatory layer independent of transcription".

Data and Code Availability

Interactive Data Portal: You can query, visualize, and download the poly(A) tail length and gene expression data from our Mouse Poly(A) Tail Atlas website.
Raw Sequencing Data: The FLEP-seq2 data generated in this study have been deposited in the GSA (Genome Sequence Archive) database under accession number CRA028430.
Analysis Code: All code for the preprocessing pipeline and downstream analysis is hosted in this GitHub repository.

System Requirements

Python (version >= 3.8 is recommended)
- numpy (>= 1.23)
- pandas (>= 1.5.3)
- scikit-learn (>= 1.2.2)
- matplotlib (>= 3.7)
- seaborn (>= 0.11.2)
- pysam (>= 0.21.0)
- scipy (>= 1.10.1)
- gseapy (>= 1.0.5)
R (version 4.2.2)
- WGCNA (>= 1.73)
- tidyverse (>= 1.3.2)
- dplyr (>= 1.1.0)
- BiocManager
Command-line tools
- Isoquant (v3.6.0)
- samtools (v1.3.1)

Installation Guide

This project requires separate Python and R environments. The recommended method for managing these is conda.

Step 1: Clone the Repository

First, clone this repository to your local machine and navigate into the directory.

git clone https://github.com/ZhaiLab-SUSTech/Mouse_polya_atlas.git
cd Mouse_polya_atlas

Step 2: Set Up the Python Environment

Create the environment from the file: We provide a python_env.yml file to ensure all dependencies are correct. Save the following content as python_env.yml in the project directory:

Create and activate the conda environment:

# Create the environment
conda env create -f python_env.yml

# You can activate it when needed with:
# conda activate data_prep_env

Step 3: Set Up the R Environment

This environment is for running the main WGCNA analysis.

Create a base R environment using conda:

conda create -n wgcna_env -c conda-forge r-base=4.2.2

Activate the new R environment:
```
conda activate wgcna_env
```
Install required R packages::
```
Rscript install_packages.R
```

Project Structure

The project is organized into a main Snakemake workflow, supplemented by individual scripts for debugging and a modular collection of downstream analyses.

Snakefile & config.yaml: The central Snakemake workflow and its configuration, orchestrating the entire preprocessing pipeline.
data/: Contains annotation files and defines the expected directory structure for input data. Note: The fasta and gtf files in data/annotation/ are placeholders and should be replaced with the actual reference files before running the pipeline.
scripts/: Contains all executable scripts.
- preprocessing_pipeline/: Scripts for the main data processing workflow. Each step includes a submit_*.sh script for manual execution and debugging on a cluster.
- Downstream_analysis/: A collection of modular Python and R scripts used to generate the figures and statistics for the manuscript.
- utils/: General utility scripts called by the main pipelines.
results/: Stores all output files generated by the analysis (this directory is not tracked by Git).

Analysis Pipeline

The analysis is organized into a primary preprocessing workflow managed by Snakemake, followed by a series of downstream analysis scripts.

Preprocessing Workflow

The entire preprocessing pipeline, from raw BAM files to the final distance matrices, is defined in the Snakefile.

Downstream Analysis

The scripts located in scripts/Downstream_analysis/ are used to generate the figures and statistical results presented in the paper. These are designed to be run manually after the preprocessing workflow is complete. Please see the individual scripts for details on their inputs and outputs.

Note on Heatmaps: The pan-organ gene expression and poly(A) distribution heatmaps presented in the manuscript were generated using the tools available on our interactive Mouse Poly(A) Tail Atlas website. Therefore, the code for generating these specific figures is not included in this repository.

How to Cite

If you use our data or code in your research, please cite our paper:

Lei, H., Long, Y., Wu, S., Wang, X., Peng, Y., Liu, Z., Lu, W., Yi, S., Zou, M., Xia, Y., et al. (2025). Pan-organ poly(A) atlas reveals a post-transcriptional regulatory layer independent of transcription.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pan-organ poly(A) atlas reveals a post-transcriptional regulatory layer independent of transcription

Overview

Data and Code Availability

System Requirements

Installation Guide

Step 1: Clone the Repository

Step 2: Set Up the Python Environment

Step 3: Set Up the R Environment

Project Structure

Analysis Pipeline

How to Cite

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
scripts		scripts
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml
install_packages.R		install_packages.R
python_env.yml		python_env.yml

License

ZhaiLab-SUSTech/Mouse_polya_atlas

Folders and files

Latest commit

History

Repository files navigation

Pan-organ poly(A) atlas reveals a post-transcriptional regulatory layer independent of transcription

Overview

Data and Code Availability

System Requirements

Installation Guide

Step 1: Clone the Repository

Step 2: Set Up the Python Environment

Step 3: Set Up the R Environment

Project Structure

Analysis Pipeline

How to Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages