Skip to content

ylaboratory/ANDES

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ANDES

Algorithm for Network Data Embedding and Similarity analysis

ANDES is a suite of standalone scripts for comparing similarity between gene sets using precomputed gene embeddings. It includes a consensus protein–protein interaction network embedding generated with node2vec and a sample geneset database (Gene Ontology Biological Process genesets for Homo Sapiens).

Features

  • Gene-set similarity: Compute pairwise similarity scores between two gene sets in embedding space.
  • Embedding-based GSEA: Perform a ranked Gene Set Enrichment Analysis (GSEA) using embedding-derived gene rankings.

These functions are implemented in src/set_analysis_fun.py and the demo jupyter notebook (demo.ipynb) shows sample usage.

Citation

If you use ANDES in your work, please cite:

A best-match approach for gene set analyses in embedding spaces. Li L, Dannenfelser R, Cruz C, Yao V. Genome Research. 2024.

Installation

  1. Install conda if you haven't already
  2. Create and activate the ANDES environment:
conda env create -f env.yml
conda activate ANDES

Usage

To quickly get started we recommend looking at our demo.ipynb. Alternatively, ANDES can be run from the command line in both modes with the following commands.

Compute similarity between all pairs of genesets in two databases / gmt files:

python src/andes.py --emb embedding_file.csv --genelist embedding_gene_ids.txt --geneset1 first_gene_set_database.gmt --geneset2 second_gene_set_database.gmt --out output_file.csv -n num_processor

Compute a ranked-based comparison for a geneset database (such as Gene Ontology) given a ranked list of genes

python src/andes_gsea.py --emb embedding_file.csv --genelist embedding_gene_ids.txt --geneset gene_set_database.gmt --rankedlist ranked_genes.txt --out output_file.csv -n num_processor

About

A set of python scripts for comparing similiarity between genesets using gene embeddings.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •