Skip to content

Raise error for disallowed columns #992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion qiita_db/metadata_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -750,7 +750,7 @@ def _table_name(cls, obj_id):

@classmethod
def _check_special_columns(cls, md_template, obj):
r"""Checks for special columns based on obj type
r"""Checks for special columns based on obj type, and invalid col names

Parameters
----------
Expand All @@ -760,6 +760,13 @@ def _check_special_columns(cls, md_template, obj):
The obj to which the metadata template belongs to. Study in case
of SampleTemplate and RawData in case of PrepTemplate
"""
# Check disallowed col names
disallowed = {'study_id', 'processed_data_id'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👎👎👎

The reason why I strongly disagree with this line is because this columns are disallowed because that is the output of the search engine. If the columns on the search engine are in the metadata template, the search engine breaks because SQL has multiple columns with the same name and its ambiguous.

My recommended solution, which I presented offline to @squirrelo and he disagrees, is importing the list of output columns from the search engine and disallow them here. The main issue is that the list of columns exist in the search engine and here, forcing the developer to remember that if he is modifying the search engine, he has to modify also this list in case that the output columns change.

We already have had an issue on trusting the developer to do the right thing (the purge_filepaths function issue) and I do not agree on introducing another potential spot where this can occur. Thus, I think that minimizing code duplication and developer burden is always the right path to move forward, rather than using a comment on all caps as per suggestion of @squirrelo ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that, while your idea works for a single object, it does not scale to multiple objects. If we suddenly need disallowed columns in the analysis, job, and ontology objects in the future, what then? There will still need to be manual editing of the function to reflect those new disallowed columns, completely negating the above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antgonza 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antgonza 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note also, though, that Jose's idea violates the prime directive of qiita_db objects never importing from any other qiita_db object file. We've kept that going until now, so we should NOT break that in my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is already violated on the metadata_template.py file, so we need to change that ASAP unless we are droppng that directive and creating circular import fun again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind creating an issue about this?

On (Mar-17-15|19:52), Joshua Shorenstein wrote:

@@ -760,6 +760,13 @@ def _check_special_columns(cls, md_template, obj):
The obj to which the metadata template belongs to. Study in case
of SampleTemplate and RawData in case of PrepTemplate
"""

  •    # Check disallowed col names
    
  •    disallowed = {'study_id', 'processed_data_id'}
    

Actually, this is already violated on the metadata_template.py file, so we need to change that ASAP unless we are droppng that directive and creating circular import fun again.


Reply to this email directly or view it on GitHub:
https://github.com/biocore/qiita/pull/992/files#r26634923

invalid = disallowed.intersection(md_template.columns)
if len(invalid) > 0:
raise QiitaDBColumnError("Disallowed column names found! "
"Please change these column names: %s" %
", ".join(invalid))
# Check required columns
missing = set(cls.translate_cols_dict.values()).difference(md_template)
if not missing:
Expand Down
15 changes: 15 additions & 0 deletions qiita_db/test/test_metadata_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -1101,6 +1101,13 @@ def test_create_already_prefixed_samples(self):
['2.Sample3', "Value for sample 3"]]
self.assertEqual(obs, exp)

def test_create_disallowed_column(self):
for key in self.metadata_dict:
self.metadata_dict[key].update({"study_id": "NOOOOOOOOO"})
df = pd.DataFrame.from_dict(self.metadata_dict, orient='index')
with self.assertRaises(QiitaDBColumnError):
SampleTemplate.create(df, self.new_study)

def test_delete(self):
"""Deletes Sample template 1"""
SampleTemplate.create(self.metadata, self.new_study)
Expand Down Expand Up @@ -1714,6 +1721,14 @@ def test_create_bad_sample_names(self):
PrepTemplate.create(self.metadata, self.new_raw_data,
self.test_study, self.data_type)

def test_create_disallowed_column(self):
for key in self.metadata_dict:
self.metadata_dict[key].update({"study_id": "NOOOOOOOOO"})
df = pd.DataFrame.from_dict(self.metadata_dict, orient='index')
with self.assertRaises(QiitaDBColumnError):
PrepTemplate.create(df, self.new_raw_data,
self.test_study, self.data_type)

def test_create_unknown_sample_names(self):
# set two real and one fake sample name
self.metadata_dict['NOTREAL'] = self.metadata_dict['SKB7.640196']
Expand Down