For easily accessing DBNascent metadata and data files, please see nascent.colorado.edu.
This repository is intended for building, updating, and querying DBNascent. This is a MySQL database cataloguing all nascent sequencing experiments in the SRA through 2020. The database has been built and maintained by the DnA Lab at University of Colorado Boulder.
Data in the database pulls from manually curated metadata tables, quality control data, and bidirectional call data from samples. All data is present on the Fiji cluster at CU Boulder.
Version notes (01/14/2025):
- New fields added
Table | Field | Description |
---|---|---|
papers |
geo |
GEO accession number |
papers |
full_citation |
Full paper citation |
samples |
raw_strandedness |
Strandedness of raw FASTQ data |
samples |
mapped_strandedness |
Strandedness of processed FASTQ data used as input for mapping |
samples |
wildtype_untreated |
True only if a sample is a wildtype cell and untreated or vehicle-treated |
samples |
fcgene_avail |
Gene counts available for sample |
samples |
fcbidir_avail |
Master file bidirectional counts available for sample |
samples |
tfit_avail |
Tfit bidirectional calls available for sample |
samples |
dreg_avail |
dREG bidirectional calls available for sample |
samples |
tdf_avail |
TDF visualization file available for sample |
The database was built with python 3.6.3. The following packages are required for building OR querying:
configparser v5.2.0 or higher
numpy v1.19.2 or higher
yaml v5.4.1 or higher
pymysql v1.0.2 or higher (may substitute a different MySQL translator)
sqlalchemy v1.4.31 or higher
(Generated with https://github.com/sqlalchemy/sqlalchemy/wiki/SchemaDisplay)
All database objects and functions are defined in dborm.py and dbutils.py.
In order to seamlessly integrate with the django website querying this database, the tables should be initially created through a django migration within the website repository on Gitlab. However, the schemas specified for django are the same as those specified here, with a few additional tables generated by django. Thus the database can be created with this repository alone if necessary.
config_build.py
defines file paths and fields outside of and within the database. Adding a field to a metadata table requires adding it to the config_build.py
file as well.
organisms.txt
, sample_cell_types.txt
, and searcheq.txt
are manually curated tables defining organisms, tissues, and unique values within the database. Adding data may require adding additional lines to these files.
The main scripts for building the database are db_global_add_update.py
and db_paper_add_update.py
, combined in the db_build_full.sbatch
script.
The database can be queried with defined fields and filtering specifications with query_printout.py
for input into DESeq2 or other applications. This script relies on the config_query.txt
config file, as well as the dborm.py
and dbutils.py
. If the query is complex enough, it may require a manual MySQL query, which can be easily passed to the database and printed out with the manual_query_printout.py
script.
Both config files refer to a credentials file that contains your credentials for accessing the database. This file should be a one-line two-column tab delimited file: <username><tab><password>