Skip to content

Merge relax-md-req into master #1122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 98 commits into from
Apr 29, 2015
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
94bf806
Modifying the database
josenavas Apr 17, 2015
b3aa78d
Fixing populate_test_db.sql
josenavas Apr 17, 2015
fd13c59
Renaming tables as the old names do not make sense now
josenavas Apr 17, 2015
cf85cfd
Merge branch 'fix-extend' into fix-metadata-creation
josenavas Apr 17, 2015
b677ba0
Fixing sample obj tests
josenavas Apr 17, 2015
7896e22
Correctly merging the column values in the sample template
josenavas Apr 17, 2015
86f6474
Merge branch 'relax-db' into fix-sample-obj
josenavas Apr 17, 2015
cc3b359
Fixing latitude/longitude types on populate
josenavas Apr 17, 2015
de3659e
Merge branch 'relax-db' into fix-sample-obj
josenavas Apr 17, 2015
6472cf3
Fixing ReadOnly tests for PrepSample
josenavas Apr 17, 2015
4e87d20
Fixing ReadWrite tests for PrepSample
josenavas Apr 17, 2015
8ec3b27
Adding the column_restriction module
josenavas Apr 17, 2015
a13a7b8
Fixing SampleTemplate ReadOnly tests
josenavas Apr 17, 2015
c64ab6c
Fixng all SampleTemplate tests
josenavas Apr 17, 2015
5f23a55
Removing column restriction as it can be added in constants
josenavas Apr 17, 2015
ef284f3
Fixing TestPrepTemplateReadOnly
josenavas Apr 17, 2015
694f4ab
Fixing all prep template object tests
josenavas Apr 17, 2015
3728f63
All tests under metadata_template passing
josenavas Apr 17, 2015
b145fca
Fixing flake8
josenavas Apr 17, 2015
85a32dc
Merge branch 'master' of https://github.com/biocore/qiita into relax-db
josenavas Apr 21, 2015
4ea0312
Merge branch 'master' of https://github.com/biocore/qiita into fix-sa…
josenavas Apr 21, 2015
2990e23
Merge remote-tracking branch 'upstream/cart-branch' into relax-md-req
josenavas Apr 22, 2015
af248cf
Preparing files for the merge
josenavas Apr 22, 2015
b2721eb
Solving the hell of the merge conflict
josenavas Apr 22, 2015
771c27d
Merge branch 'relax-db' into fix-sample-obj
josenavas Apr 22, 2015
78f763a
Fix merge conflicts
josenavas Apr 23, 2015
fbdf463
Fixing test_setup.py
josenavas Apr 23, 2015
ff4aae6
Fixing util.py
josenavas Apr 23, 2015
f7e62bd
Fixing data.py
josenavas Apr 23, 2015
10d9a0e
Fixing search.py
josenavas Apr 23, 2015
299c52f
Chaning queue name
josenavas Apr 23, 2015
18efa9c
Adding the missing prep template to the DB
josenavas Apr 23, 2015
e845794
Fixing type on populate_test_db.sql
josenavas Apr 23, 2015
a8b7d98
Doing all the patch in SQL
josenavas Apr 24, 2015
8772f32
Merge branch 'relax-db' into fix-sample-obj
josenavas Apr 24, 2015
013d126
Merge branch 'fix-sample-obj' into fix-metadata-obj
josenavas Apr 24, 2015
e65a41d
Merge branch 'fix-metadata-obj' into fix-qiita-db-tests
josenavas Apr 24, 2015
c6147b0
Merge branch 'fix-qiita-db-tests' into fix-analysis-tests
josenavas Apr 24, 2015
0ddb98a
Atatching the new prep template file to the prep template
josenavas Apr 24, 2015
5d7a752
fixing test_setup.py
josenavas Apr 24, 2015
e9fab3e
Fixing test_reference.py by removing magic numbers
josenavas Apr 24, 2015
ff305f9
Fixing test_job.py by removing magic numbers
josenavas Apr 24, 2015
01ec8bc
Fixing test_prep_template.py by removing magic numbers
josenavas Apr 24, 2015
69cfcc1
Fixing test_meta_util.py
josenavas Apr 24, 2015
c5f3107
Adding the qiime mapping file
josenavas Apr 24, 2015
4ef8b42
Fixing all the analysis tests. Fixes partially #247. Fixes #465
josenavas Apr 24, 2015
847485b
Fixing tests due to the addition of the mapping file
josenavas Apr 25, 2015
5ee643f
addressing @squirrelo's comments
josenavas Apr 25, 2015
3d4cd42
Removing patch as per @squirrelo's suggestion
josenavas Apr 25, 2015
e75c372
Fixing qiita ware test util
josenavas Apr 25, 2015
fa4a68b
Fixing qiita ware tests
josenavas Apr 25, 2015
b43c3cc
Flake8
josenavas Apr 25, 2015
69395bf
Merge branch 'relax-db' into fix-sample-obj
josenavas Apr 25, 2015
ba49dc5
Reading python patch as now it has the needed functionality
josenavas Apr 25, 2015
3d4d7dd
Merge branch 'fix-metadata-obj' into fix-qiita-db-tests
josenavas Apr 25, 2015
e7cefb0
Merge branch 'fix-qiita-db-tests' into fix-analysis-tests
josenavas Apr 25, 2015
658a5b5
Merge branch 'fix-analysis-tests' into fix-qiita-ware-tests
josenavas Apr 25, 2015
6177e4e
Merge pull request #1073 from josenavas/relax-db
antgonza Apr 27, 2015
0439de9
Merge branch 'relax-md-req' of https://github.com/biocore/qiita into …
josenavas Apr 27, 2015
157b60a
Adding qiime-map property to the prep template
josenavas Apr 27, 2015
2c5027e
Merge pull request #1074 from josenavas/fix-sample-obj
adamrp Apr 27, 2015
57ed605
Merge branch 'master' of https://github.com/biocore/qiita into fix-me…
josenavas Apr 27, 2015
2a29012
Merge branch 'relax-md-req' of https://github.com/biocore/qiita into …
josenavas Apr 27, 2015
b2c4f70
Merge branch 'master' of https://github.com/biocore/qiita into relax-…
josenavas Apr 27, 2015
0cfab56
Merge branch 'relax-md-req' of https://github.com/biocore/qiita into …
josenavas Apr 27, 2015
62f0d3e
Merge branch 'relax-md-req' of https://github.com/biocore/qiita into …
josenavas Apr 27, 2015
f81caa9
Fixing call to clean validate template
josenavas Apr 27, 2015
134678d
Removing all warnings from tests
josenavas Apr 27, 2015
6aa5335
Fixing create qiime mapping file add_filepath call
josenavas Apr 27, 2015
1cc26e9
Addressing comments
josenavas Apr 27, 2015
2d3df08
Reducing the rename_cols dict
josenavas Apr 28, 2015
79256ff
Merge pull request #1075 from josenavas/fix-metadata-obj
antgonza Apr 28, 2015
0829f95
Merge branch 'relax-md-req' of https://github.com/biocore/qiita into …
josenavas Apr 28, 2015
d777469
Addressing comments from @ElDeveloper and @squirrelo
josenavas Apr 28, 2015
b26a867
Merge pull request #1099 from josenavas/fix-qiita-db-tests
squirrelo Apr 28, 2015
e76a10c
Merge branch 'relax-md-req' of https://github.com/biocore/qiita into …
josenavas Apr 28, 2015
d1bee5e
Addressing @ElDeveloper comments
josenavas Apr 28, 2015
8be914f
Removing inference - good catch @ElDeveloper\!
josenavas Apr 28, 2015
b604ae1
Merge pull request #1106 from josenavas/fix-analysis-tests
ElDeveloper Apr 28, 2015
a663d33
Merge branch 'relax-md-req' of https://github.com/biocore/qiita into …
josenavas Apr 28, 2015
b19b60c
Merge branch 'relax-md-req' of https://github.com/biocore/qiita into …
josenavas Apr 28, 2015
e7363a3
Merge branch 'add-qiime-map-func' into fix-qiita-ware-tests
josenavas Apr 28, 2015
d464237
Fixing bug in add filepath
josenavas Apr 28, 2015
84beb81
Fixing _get_qiime_minimal_mapping
josenavas Apr 28, 2015
c30dd8c
Fixing qiime map parsing
josenavas Apr 28, 2015
e5ae3e4
Removing warnings from tests:
josenavas Apr 28, 2015
14b0c49
Addressing comments
josenavas Apr 29, 2015
0c9a47a
Adding test with reverse linker primer
josenavas Apr 29, 2015
6c83ba4
Fixing WTF failures
josenavas Apr 29, 2015
7390538
Merge pull request #1107 from josenavas/fix-qiita-ware-tests
ElDeveloper Apr 29, 2015
a546233
Reverting the comment from @squirrelo on search for JOIN...ON -> USIN…
josenavas Apr 29, 2015
402003a
Removing the ambiguity
josenavas Apr 29, 2015
d87890b
Removing join as now all the columns are in the dynamic table...
josenavas Apr 29, 2015
f73887f
Execute the tests even if you change the format... there might be a s…
josenavas Apr 29, 2015
4830b24
Merge branch 'master' of https://github.com/biocore/qiita into join-u…
josenavas Apr 29, 2015
7a29adc
Fixing bug in _build_mapping_file and create a test for it
josenavas Apr 29, 2015
fa2392c
Adding tests for test_build_biom_tables
josenavas Apr 29, 2015
1a163fb
Merge pull request #1126 from josenavas/join-using-instead-on-db-issues
antgonza Apr 29, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
256 changes: 66 additions & 190 deletions qiita_db/metadata_template/base_metadata_template.py

Large diffs are not rendered by default.

74 changes: 69 additions & 5 deletions qiita_db/metadata_template/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,74 @@
# The full license is in the file LICENSE, distributed with this software.
# -----------------------------------------------------------------------------

from collections import namedtuple
from future.utils import viewkeys, viewvalues

Restriction = namedtuple('Restriction', ['columns', 'error_msg'])

# A dict containing the restrictions that apply to the sample templates
SAMPLE_TEMPLATE_COLUMNS = {
# The following columns are required by EBI for submission
'EBI': Restriction(columns={'collection_timestamp': 'timestamp',
'physical_specimen_location': 'varchar'},
error_msg="EBI submission disabled"),
# The following columns are required for the official main QIITA site
'qiita_main': Restriction(columns={'sample_type': 'varchar',
'description': 'varchar',
'physical_specimen_remaining': 'bool',
'dna_extracted': 'bool',
'latitude': 'float8',
'longitude': 'float8',
'host_subject_id': 'varchar'},
error_msg="Processed data approval disabled")
}

# A dict containing the restrictions that apply to the prep templates
PREP_TEMPLATE_COLUMNS = {
# The following columns are required by EBI for submission
'EBI': Restriction(
columns={'primer': 'varchar',
'center_name': 'varchar',
'platform': 'varchar',
'library_construction_protocol': 'varchar',
'experiment_design_description': 'varchar'},
error_msg="EBI submission disabled")
}

# Different prep templates have different requirements depending on the data
# type. We create a dictionary for each of these special datatypes

TARGET_GENE_DATA_TYPES = ['16S', '18S', 'ITS']
REQUIRED_TARGET_GENE_COLS = {'barcodesequence', 'linkerprimersequence',
'run_prefix', 'library_construction_protocol',
'experiment_design_description', 'platform'}
RENAME_COLS_DICT = {'barcode': 'barcodesequence',
'primer': 'linkerprimersequence'}

PREP_TEMPLATE_COLUMNS_TARGET_GENE = {
# The following columns are required by QIIME to execute split libraries
'demultiplex': Restriction(
columns={'barcode': 'varchar',
'primer': 'varchar'},
error_msg="Demultiplexing disabled. You will not be able to "
"preprocess your raw data"),
# The following columns are required by Qiita to know how to execute split
# libraries using QIIME over a study with multiple illumina lanes
'demultiplex_multiple': Restriction(
columns={'barcode': 'varchar',
'primer': 'varchar',
'run_prefix': 'varchar'},
error_msg="Demultiplexing with multiple input files disabled. If your "
"raw data includes multiple raw input files, you will not "
"be able to preprocess your raw data")
}

# This list is useful to have if we want to loop through all the restrictions
# in a template-independent manner
ALL_RESTRICTIONS = [SAMPLE_TEMPLATE_COLUMNS, PREP_TEMPLATE_COLUMNS,
PREP_TEMPLATE_COLUMNS_TARGET_GENE]


# A set holding all the controlled columns, useful to avoid recalculating it
def _col_iterator():
for r_set in ALL_RESTRICTIONS:
for restriction in viewvalues(r_set):
for cols in viewkeys(restriction.columns):
yield cols

CONTROLLED_COLS = set(col for col in _col_iterator())
113 changes: 57 additions & 56 deletions qiita_db/metadata_template/prep_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,26 @@
# -----------------------------------------------------------------------------

from __future__ import division
from future.utils import viewvalues
from os.path import join
from time import strftime
from copy import deepcopy
import warnings

import pandas as pd

from qiita_core.exceptions import IncompetentQiitaDeveloperError
from qiita_db.exceptions import (QiitaDBColumnError, QiitaDBUnknownIDError,
QiitaDBError, QiitaDBExecutionError)
QiitaDBError, QiitaDBExecutionError,
QiitaDBWarning)
from qiita_db.sql_connection import SQLConnectionHandler
from qiita_db.ontology import Ontology
from qiita_db.util import (convert_to_id,
convert_from_id, get_mountpoint, infer_status)
from .base_metadata_template import BaseSample, MetadataTemplate
from .util import load_template_to_dataframe
from .constants import (TARGET_GENE_DATA_TYPES, RENAME_COLS_DICT,
REQUIRED_TARGET_GENE_COLS)
from .constants import (TARGET_GENE_DATA_TYPES, PREP_TEMPLATE_COLUMNS,
PREP_TEMPLATE_COLUMNS_TARGET_GENE)


class PrepSample(BaseSample):
Expand Down Expand Up @@ -66,8 +72,9 @@ class PrepTemplate(MetadataTemplate):
_table_prefix = "prep_"
_column_table = "prep_columns"
_id_column = "prep_template_id"
translate_cols_dict = {'emp_status_id': 'emp_status'}
_sample_cls = PrepSample
_fp_id = convert_to_id("prep_template", "filepath_type")
_filepath_table = 'prep_template_filepath'

@classmethod
def create(cls, md_template, raw_data, study, data_type,
Expand Down Expand Up @@ -116,8 +123,13 @@ def create(cls, md_template, raw_data, study, data_type,
data_type_id = convert_to_id(data_type, "data_type", conn_handler)
data_type_str = data_type

pt_cols = PREP_TEMPLATE_COLUMNS
if data_type_str in TARGET_GENE_DATA_TYPES:
pt_cols = deepcopy(PREP_TEMPLATE_COLUMNS)
pt_cols.update(PREP_TEMPLATE_COLUMNS_TARGET_GENE)

md_template = cls._clean_validate_template(md_template, study.id,
data_type_str, conn_handler)
pt_cols)

# Insert the metadata template
# We need the prep_id for multiple calls below, which currently is not
Expand All @@ -140,7 +152,7 @@ def create(cls, md_template, raw_data, study, data_type,
"{0} = %s".format(cls._id_column), (prep_id,))

# Check if sample IDs present here but not in sample template
sql = ("SELECT sample_id from qiita.required_sample_info WHERE "
sql = ("SELECT sample_id from qiita.study_sample WHERE "
"study_id = %s")
# Get list of study sample IDs, prep template study IDs,
# and their intersection
Expand Down Expand Up @@ -181,40 +193,6 @@ def validate_investigation_type(self, investigation_type):
"Choose from: %s" % (investigation_type,
', '.join(terms)))

@classmethod
def _check_template_special_columns(cls, md_template, data_type):
r"""Checks for special columns based on obj type

Parameters
----------
md_template : DataFrame
The metadata template file contents indexed by sample ids
data_type : str
The data_type of the template.

Returns
-------
set
The set of missing columns

Notes
-----
Sometimes people use different names for the same columns. We just
rename them to use the naming that we expect, so this is normalized
across studies.
"""
# We only have column requirements if the data type of the raw data
# is one of the target gene types
missing_cols = set()
if data_type in TARGET_GENE_DATA_TYPES:
md_template.rename(columns=RENAME_COLS_DICT, inplace=True)

# Check for all required columns for target genes studies
missing_cols = REQUIRED_TARGET_GENE_COLS.difference(
md_template.columns)

return missing_cols

@classmethod
def delete(cls, id_):
r"""Deletes the table from the database
Expand Down Expand Up @@ -412,17 +390,11 @@ def generate_files(self):
self.add_filepath(fp)

# creating QIIME mapping file
self.create_qiime_mapping_file(fp)
self.create_qiime_mapping_file()

def create_qiime_mapping_file(self, prep_template_fp):
def create_qiime_mapping_file(self):
"""This creates the QIIME mapping file and links it in the db.

Parameters
----------
prep_template_fp : str
The prep template filepath that should be concatenated to the
sample template go used to generate a new QIIME mapping file

Returns
-------
filepath : str
Expand All @@ -432,12 +404,20 @@ def create_qiime_mapping_file(self, prep_template_fp):
------
ValueError
If the prep template is not a subset of the sample template
QiitaDBWarning
If the QIIME-required columns are not present in the template

Notes
-----
We cannot ensure that the QIIME-required columns are present in the
metadata map. However, we have to generate a QIIME-compliant mapping
file. Since the user may need a QIIME mapping file, but not these
QIIME-required columns, we are going to create them and
populate them with the value XXQIITAXX.
"""
rename_cols = {
'barcode': 'BarcodeSequence',
'barcodesequence': 'BarcodeSequence',
'primer': 'LinkerPrimerSequence',
'linkerprimersequence': 'LinkerPrimerSequence',
'description': 'Description',
}

Expand All @@ -456,19 +436,38 @@ def create_qiime_mapping_file(self, prep_template_fp):

# reading files via pandas
st = load_template_to_dataframe(sample_template_fp)
pt = load_template_to_dataframe(prep_template_fp)
pt = self.to_dataframe()

st_sample_names = set(st.index)
pt_sample_names = set(pt.index)

if not pt_sample_names.issubset(st_sample_names):
raise ValueError(
"Prep template is not a sub set of the sample template, files:"
"%s %s - samples: %s" % (sample_template_fp, prep_template_fp,
str(pt_sample_names-st_sample_names)))
"Prep template is not a sub set of the sample template, files"
"%s - samples: %s"
% (sample_template_fp,
', '.join(pt_sample_names-st_sample_names)))

mapping = pt.join(st, lsuffix="_prep")
mapping.rename(columns=rename_cols, inplace=True)

# Pre-populate the QIIME-required columns with the value XXQIITAXX
index = mapping.index
placeholder = ['XXQIITAXX'] * len(index)
missing = []
for val in viewvalues(rename_cols):
if val not in mapping:
missing.append(val)
mapping[val] = pd.Series(placeholder, index=index)

if missing:
warnings.warn(
"Some columns required to generate a QIIME-compliant mapping "
"file are not present in the template. A placeholder value "
"(XXQIITAXX) has been used to populate these columns. Missing "
"columns: %s" % ', '.join(missing),
QiitaDBWarning)

# Gets the orginal mapping columns and readjust the order to comply
# with QIIME requirements
cols = mapping.columns.values.tolist()
Expand All @@ -486,11 +485,13 @@ def create_qiime_mapping_file(self, prep_template_fp):
self.id, strftime("%Y%m%d-%H%M%S")))

# Save the mapping file
mapping.to_csv(filepath, index_label='#SampleID', na_rep='unknown',
mapping.to_csv(filepath, index_label='#SampleID', na_rep='',
sep='\t')

# adding the fp to the object
self.add_filepath(filepath)
self.add_filepath(
filepath, conn_handler=conn_handler,
fp_id=convert_to_id("qiime_map", "filepath_type"))

return filepath

Expand Down
27 changes: 7 additions & 20 deletions qiita_db/metadata_template/sample_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,12 @@
from qiita_db.exceptions import (QiitaDBDuplicateError, QiitaDBError,
QiitaDBUnknownIDError)
from qiita_db.sql_connection import SQLConnectionHandler
from qiita_db.util import get_required_sample_info_status, get_mountpoint
from qiita_db.util import get_mountpoint, convert_to_id
from qiita_db.study import Study
from qiita_db.data import RawData
from .base_metadata_template import BaseSample, MetadataTemplate
from .prep_template import PrepTemplate
from .constants import SAMPLE_TEMPLATE_COLUMNS


class Sample(BaseSample):
Expand Down Expand Up @@ -66,9 +67,9 @@ class SampleTemplate(MetadataTemplate):
_table_prefix = "sample_"
_column_table = "study_sample_columns"
_id_column = "study_id"
translate_cols_dict = {
'required_sample_info_status_id': 'required_sample_info_status'}
_sample_cls = Sample
_fp_id = convert_to_id("sample_template", "filepath_type")
_filepath_table = 'sample_template_filepath'

@staticmethod
def metadata_headers():
Expand All @@ -87,19 +88,6 @@ def metadata_headers():
"WHERE table_name = 'required_sample_info' "
"ORDER BY column_name")]

@classmethod
def _check_template_special_columns(cls, md_template, study_id):
r"""Checks for special columns based on obj type

Parameters
----------
md_template : DataFrame
The metadata template file contents indexed by sample ids
study_id : int
The study to which the sample template belongs to.
"""
return set()

@classmethod
def create(cls, md_template, study):
r"""Creates the sample template in the database
Expand All @@ -123,7 +111,7 @@ def create(cls, md_template, study):

# Clean and validate the metadata template given
md_template = cls._clean_validate_template(md_template, study.id,
study.id, conn_handler)
SAMPLE_TEMPLATE_COLUMNS)

cls._add_common_creation_steps_to_queue(md_template, study.id,
conn_handler, queue_name)
Expand Down Expand Up @@ -233,8 +221,7 @@ def extend(self, md_template):
conn_handler.create_queue(queue_name)

md_template = self._clean_validate_template(md_template, self.study_id,
self.study_id,
conn_handler)
SAMPLE_TEMPLATE_COLUMNS)

self._add_common_extend_steps_to_queue(md_template, conn_handler,
queue_name)
Expand All @@ -260,7 +247,7 @@ def update(self, md_template):

# Clean and validate the metadata template given
new_map = self._clean_validate_template(md_template, self.id,
conn_handler)
SAMPLE_TEMPLATE_COLUMNS)
# Retrieving current metadata
current_map = self._transform_to_dict(conn_handler.execute_fetchall(
"SELECT * FROM qiita.{0} WHERE {1}=%s".format(self._table,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ def test_add_common_creation_steps_to_queue(self):
def test_clean_validate_template(self):
"""_clean_validate_template raises an error from base class"""
with self.assertRaises(IncompetentQiitaDeveloperError):
MetadataTemplate._clean_validate_template(None, 1, None, None)
MetadataTemplate._clean_validate_template(None, 1, None)


if __name__ == '__main__':
Expand Down
Loading