Pan-organ poly(A) atlas reveals a post-transcriptional regulatory layer independent of transcription
This study constructs a comprehensive poly(A) tail atlas by performing unprecedentedly deep, full-length nanopore sequencing across 18 mouse organs. The initial processing of raw data, from FAST5 files to poly(A) tail length measurement, was performed using the FLEP-seq analysis pipeline. This repository contains the subsequent code for downstream analysis and visualization presented in the manuscript, "Pan-organ poly(A) atlas reveals a post-transcriptional regulatory layer independent of transcription".
- Interactive Data Portal: You can query, visualize, and download the poly(A) tail length and gene expression data from our Mouse Poly(A) Tail Atlas website.
- Raw Sequencing Data: The FLEP-seq2 data generated in this study have been deposited in the GSA (Genome Sequence Archive) database under accession number CRA028430.
- Analysis Code: All code for the preprocessing pipeline and downstream analysis is hosted in this GitHub repository.
- Python (version >= 3.8 is recommended)
- numpy(>= 1.23)
- pandas(>= 1.5.3)
- scikit-learn(>= 1.2.2)
- matplotlib(>= 3.7)
- seaborn(>= 0.11.2)
- pysam(>= 0.21.0)
- scipy(>= 1.10.1)
- gseapy(>= 1.0.5)
 
- R (version 4.2.2)
- WGCNA(>= 1.73)
- tidyverse(>= 1.3.2)
- dplyr(>= 1.1.0)
- BiocManager
 
- Command-line tools
- Isoquant(v3.6.0)
- samtools(v1.3.1)
 
This project requires separate Python and R environments. The recommended method for managing these is conda.
First, clone this repository to your local machine and navigate into the directory.
git clone https://github.com/ZhaiLab-SUSTech/Mouse_polya_atlas.git
cd Mouse_polya_atlas- 
Create the environment from the file: We provide a python_env.ymlfile to ensure all dependencies are correct. Save the following content aspython_env.ymlin the project directory:
- 
Create and activate the conda environment: # Create the environment conda env create -f python_env.yml # You can activate it when needed with: # conda activate data_prep_env 
This environment is for running the main WGCNA analysis.
- 
Create a base R environment using conda: conda create -n wgcna_env -c conda-forge r-base=4.2.2 
- 
Activate the new R environment: conda activate wgcna_env 
- 
Install required R packages:: Rscript install_packages.R 
The project is organized into a main Snakemake workflow, supplemented by individual scripts for debugging and a modular collection of downstream analyses.
- Snakefile&- config.yaml: The central Snakemake workflow and its configuration, orchestrating the entire preprocessing pipeline.
- data/: Contains annotation files and defines the expected directory structure for input data. Note: The- fastaand- gtffiles in- data/annotation/are placeholders and should be replaced with the actual reference files before running the pipeline.
- scripts/: Contains all executable scripts.- preprocessing_pipeline/: Scripts for the main data processing workflow. Each step includes a- submit_*.shscript for manual execution and debugging on a cluster.
- Downstream_analysis/: A collection of modular Python and R scripts used to generate the figures and statistics for the manuscript.
- utils/: General utility scripts called by the main pipelines.
 
- results/: Stores all output files generated by the analysis (this directory is not tracked by Git).
The analysis is organized into a primary preprocessing workflow managed by Snakemake, followed by a series of downstream analysis scripts.
- Preprocessing Workflow
The entire preprocessing pipeline, from raw BAM files to the final distance matrices, is defined in the Snakefile.
- Downstream Analysis
The scripts located in scripts/Downstream_analysis/ are used to generate the figures and statistical results presented in the paper. These are designed to be run manually after the preprocessing workflow is complete. Please see the individual scripts for details on their inputs and outputs.
Note on Heatmaps: The pan-organ gene expression and poly(A) distribution heatmaps presented in the manuscript were generated using the tools available on our interactive Mouse Poly(A) Tail Atlas website. Therefore, the code for generating these specific figures is not included in this repository.
If you use our data or code in your research, please cite our paper:
Lei, H., Long, Y., Wu, S., Wang, X., Peng, Y., Liu, Z., Lu, W., Yi, S., Zou, M., Xia, Y., et al. (2025). Pan-organ poly(A) atlas reveals a post-transcriptional regulatory layer independent of transcription.