transcriptomic-data-integrator

A Python package to retrieve and prepare gene expression data from Gene Expression Omnibus and Genomic Data Commons.

Archived: This was my earlier attempt to automate common data retrieval and preprocessing tasks for gene expression data. Interested readers might want to check out these other resources instead:

The gdc module (to retrieve and prepare data from GDC) is not implemented.

Prerequisites

Python 3.9 or higher.
To use the functions for normalization or batch correction from the preprocess module, install R and packages:
- edgeR
- oligo
- readr
- sva

For RMA normalization, you will need to install platform design info packages, such as:

pd.clariom.d.human
pd.hg.u133.plus.2
other packages for different platforms you might encounter

These packages can be installed in an R environment by running the script install_r_packages.R. This install script was written for R 4.3.

Setup: Install and configure the `transcriptomic_data_integrator` package

Run the below commands at the command line. Replace dummy email with your email which will be submitted in your GEO queries to the NCBI API.

git clone https://github.com/fogg-lab/transcriptomic-data-integrator.git
cd transcriptomic-data-integrator
pip install -e .
configure-ncbi-email YOUR_EMAIL@EXAMPLE.COM

Usage

Refer to the documentation and Colab notebooks.

Known limitations

The function tdi.geo.map_probes_to_genes is not guaranteed to work on all microarray platform technologies. This is due to differences in how the probe set annotation table is organized between different platforms.
Other GEO query functions, such as tdi.geo.get_geo_clinical_characteristics, fail when the data for the study on GEO is not organized according to how this package expects. This happens more times than not.

If you encounter any problems using the package, please submit an issue to report it.

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
notebooks		notebooks
src/transcriptomic_data_integrator		src/transcriptomic_data_integrator
.gitattributes		.gitattributes
DOCUMENTATION.md		DOCUMENTATION.md
LICENSE		LICENSE
README.md		README.md
install_r_packages.R		install_r_packages.R
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

transcriptomic-data-integrator

Prerequisites

Setup: Install and configure the `transcriptomic_data_integrator` package

Usage

Known limitations

About

Uh oh!

Releases

Packages

Languages

License

fogg-lab/transcriptomic-data-integrator

Folders and files

Latest commit

History

Repository files navigation

transcriptomic-data-integrator

Prerequisites

Setup: Install and configure the transcriptomic_data_integrator package

Usage

Known limitations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Setup: Install and configure the `transcriptomic_data_integrator` package

Packages