NJLAGS SV Project

This repository contains data and scripts for Alibutud, Hansali, Cao, Zhou, Mahaganapathy, Azaro, Gwin, Wilson, Buyske, Bartlett, Flax, Brzustowciz, and Xing (2023) Structural variations contribute to the genetic etiology of autism spectrum disorder and language impairments. Int. J. Mol. Sci. 24(17), 13248

Project structure

The project is divided into two pipelines (CNV and gSV/MEI). The gSV/MEI pipeline is additionally comprised of results from AF and eQTL analysis.

      CNV                    gSV/MEI
 ┌────────────┐         ┌─────────────┐
 │            │         │AF Pipeline  │
 │CNV Pipeline│         │             │
 │            │         │eQTL Pipeline│
 └─────┬──────┘         └──────┬──────┘
       │                       │
       │   ┌───────────────┐   │
       └───┤Candidate genes├───┘
           └───────┬───────┘
                   │
              ┌────┴───┐
              │Analysis│
              └────────┘

CNV Pipeline

Input files

Data that has to be input into the pipeline from external sources, rather than being generated by the pipeline itself.

Builder phase:

starting_data_files: microarray files from Mahaganapathy
NJLAGS_CNV.ped: pedigree file of NJLAGS cohort
ASD_only.ped: pedigree file for ASD_only phenotype
ASD_LI.ped: pedigree file for ASD_only phenotype
ASD_RI.ped: pedigree file for ASD_only phenotype
human_g1k_v37.fasta: used for conversion to VCF

Prioritization phase:

NDD_genes.txt: neurodevelopmental disorder genes, from SFARI
dispensable_genes_and_muc: genes to exclude, from Rausell et al
GTEx_2019_12_12.txt: gene expression file from GTEx
tpm1.txt/tpm2.txt/tpm3.txt: gene expression files from other sources
syndromes.bed: known ASD related syndromes, from SFARI

Instructions

The CNV_Master.py script will run the entire pipeline from start to finish. It does so across three phases, calling the corresponding Python scripts as it does so:

CNV_Builder.py
- Merge CNV files across batches and callers
- Convert to VCF format for use in Annotation
CNV_Annotator.py
- QC filtration to remove outliers
- Running AnnotSV annotation on VCF
- Running GEMINI-based Python script “Geminesque.py” on annotated VCF
- Filtering on segregation analysis
CNV_Prioritizer.py
- Prioritizing candidate variants on dispensability, coding region overlap, gene expression, internal cohort frequency
- Calling StrVCTVRE to predict pathogenicity
- Calling SvAnna to predict pathogenicity

Additionally, the CNV_Summarizer.py script creates summary tables for analysis.

Running CNV_Master.py

CNV_Master.py already has all the commands for the constituent scripts. All that needs to be specified is the project folder where the pipeline is being run and the proband phenotype of interest.

# DECLARE GLOBAL VARIABLES
projects_folder = "PROJECT FOLDER FILEPATH"
phenotype = "PHENOTYPE" # set phenotype
print("Projects folder located at: " + projects_folder)

gSV Pipeline

Instructions

In order to run the gSV pipeline, the following scripts in the /doc/ folder must be run sequentially:

0_merging_callsets.sh
1_annotation.sh
2_inh_segregation.sh
3_adding_datasets.sh
4_svanna_psv_score.sh
5_universal_filtering.sh
6_results.sh
7_tables_and_figures.sh

Figures and Tables

This directory contains three folders with the scripts used to generate the figures and tables for this project. Two of them contain the scripts for the CNV and gSV pipeline figures/tables respectively, while the third contains the scripts for generating the protein-protein interaction network used in Figure 5.

CNV figures and tables

Starting_Data_Scripts.py - Used to build Table 1 and Table 2
CNV_Summarizer.py - Used to build Table 5
CNV_Plots.py - Used to plot Figure S1(A)
CNV_Enrichment.py - Used to build table for use in Figure S1(B)
BoxPlotEnrichment.R - Used to plot Figure S1(B)

gSV figures and tables

7_tables_and_figures.sh is used to generate the tables and figures
GenerateGeneLevelSummaryTable_CNV.py - Used to build CNV input table for Table 4
GenerateGeneLevelSummaryTable_MEISV.py - Used to build gSV input table for Table 4
GenerateGeneLevelSummaryTable.py - Used to build Table 4
GenerateSizeDistributionChartsByMEI.py - Used to plot Figure S2

PPI Network

This folder contains the files and scripts used to generate Figure 5. Pathway data was drawn from ConsensusPathDB, and input genes from the results of the combined CNV/gSV pipeline.

Network_gene.py - Used to build gene table referenced by Network_nodes_edges.py
Network_nodes_edges.py - Used to build table defining network structure
Network_plot.R - Used to plot PPI Network

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
CNV_Pipeline		CNV_Pipeline
Figures_and_Tables		Figures_and_Tables
gSV_Pipeline		gSV_Pipeline
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NJLAGS SV Project

Project structure

CNV Pipeline

Input files

Instructions

Running CNV_Master.py

gSV Pipeline

Instructions

Figures and Tables

CNV figures and tables

gSV figures and tables

PPI Network

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

JXing-Lab/NJLAGS_SV

Folders and files

Latest commit

History

Repository files navigation

NJLAGS SV Project

Project structure

CNV Pipeline

Input files

Instructions

Running CNV_Master.py

gSV Pipeline

Instructions

Figures and Tables

CNV figures and tables

gSV figures and tables

PPI Network

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages