Skip to content
This repository was archived by the owner on Jul 13, 2024. It is now read-only.

A Python package to search, retrieve, and prepare gene expression data from Gene Expression Omnibus and Genomic Data Commons.

License

Notifications You must be signed in to change notification settings

fogg-lab/transcriptomic-data-integrator

Repository files navigation

transcriptomic-data-integrator

A Python package to retrieve and prepare gene expression data from Gene Expression Omnibus and Genomic Data Commons.

Archived: This was my earlier attempt to automate common data retrieval and preprocessing tasks for gene expression data. Interested readers might want to check out these other resources instead:

The gdc module (to retrieve and prepare data from GDC) is not implemented.

Prerequisites

  • Python 3.9 or higher.
  • To use the functions for normalization or batch correction from the preprocess module, install R and packages:

For RMA normalization, you will need to install platform design info packages, such as:

These packages can be installed in an R environment by running the script install_r_packages.R. This install script was written for R 4.3.

Setup: Install and configure the transcriptomic_data_integrator package

Run the below commands at the command line. Replace dummy email with your email which will be submitted in your GEO queries to the NCBI API.

git clone https://github.com/fogg-lab/transcriptomic-data-integrator.git
cd transcriptomic-data-integrator
pip install -e .
configure-ncbi-email YOUR_EMAIL@EXAMPLE.COM

Usage

Refer to the documentation and Colab notebooks.

Known limitations

  • The function tdi.geo.map_probes_to_genes is not guaranteed to work on all microarray platform technologies. This is due to differences in how the probe set annotation table is organized between different platforms.
  • Other GEO query functions, such as tdi.geo.get_geo_clinical_characteristics, fail when the data for the study on GEO is not organized according to how this package expects. This happens more times than not.

If you encounter any problems using the package, please submit an issue to report it.

About

A Python package to search, retrieve, and prepare gene expression data from Gene Expression Omnibus and Genomic Data Commons.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published