Skip to content

Dowell-Lab/DBNascent-build

Repository files navigation

DBNascent_build

For easily accessing DBNascent metadata and data files, please see nascent.colorado.edu.

This repository is intended for building, updating, and querying DBNascent. This is a MySQL database cataloguing all nascent sequencing experiments in the SRA through 2020. The database has been built and maintained by the DnA Lab at University of Colorado Boulder.

Data in the database pulls from manually curated metadata tables, quality control data, and bidirectional call data from samples. All data is present on the Fiji cluster at CU Boulder.

Version 1.3

Version notes (01/14/2025):

  • New fields added
Table Field Description
papers geo GEO accession number
papers full_citation Full paper citation
samples raw_strandedness Strandedness of raw FASTQ data
samples mapped_strandedness Strandedness of processed FASTQ data used as input for mapping
samples wildtype_untreated True only if a sample is a wildtype cell and untreated or vehicle-treated
samples fcgene_avail Gene counts available for sample
samples fcbidir_avail Master file bidirectional counts available for sample
samples tfit_avail Tfit bidirectional calls available for sample
samples dreg_avail dREG bidirectional calls available for sample
samples tdf_avail TDF visualization file available for sample

Dependencies

The database was built with python 3.6.3. The following packages are required for building OR querying:

configparser v5.2.0 or higher
numpy v1.19.2 or higher
yaml v5.4.1 or higher
pymysql v1.0.2 or higher (may substitute a different MySQL translator)
sqlalchemy v1.4.31 or higher

Database schema

DBNascent database schema
(Generated with https://github.com/sqlalchemy/sqlalchemy/wiki/SchemaDisplay)

Usage

All database objects and functions are defined in dborm.py and dbutils.py.

Building and maintaining DBNascent:

In order to seamlessly integrate with the django website querying this database, the tables should be initially created through a django migration within the website repository on Gitlab. However, the schemas specified for django are the same as those specified here, with a few additional tables generated by django. Thus the database can be created with this repository alone if necessary.

config_build.py defines file paths and fields outside of and within the database. Adding a field to a metadata table requires adding it to the config_build.py file as well.

organisms.txt, sample_cell_types.txt, and searcheq.txt are manually curated tables defining organisms, tissues, and unique values within the database. Adding data may require adding additional lines to these files.

The main scripts for building the database are db_global_add_update.py and db_paper_add_update.py, combined in the db_build_full.sbatch script.

Querying DBNascent:

The database can be queried with defined fields and filtering specifications with query_printout.py for input into DESeq2 or other applications. This script relies on the config_query.txt config file, as well as the dborm.py and dbutils.py. If the query is complex enough, it may require a manual MySQL query, which can be easily passed to the database and printed out with the manual_query_printout.py script.

Both config files refer to a credentials file that contains your credentials for accessing the database. This file should be a one-line two-column tab delimited file: <username><tab><password>

About

Python scripts and utilities for building DBNascent in MySQL

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •