Skip to content

Fix metadata obj #1075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Apr 28, 2015
Merged

Conversation

josenavas
Copy link
Contributor

This PR is built on top of #1074

This PR makes all the needed modifications in the metadata template objects to get all the tests under qiita_db/metadata_template/tests to pass. The PR right now is a bit big, but as the other PR in which this one depends gets merged, the size will be reduced.

'EBI': Restriction(columns={'collection_timestamp': 'timestamp',
'physical_specimen_location': 'varchar'},
error_msg="EBI submission disabled"),
'Qiita_main': Restriction(columns={'sample_type': 'varchar',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like the casing. This is not an actual issue but looks weird and perhaps all should be lower case ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowercased!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I don't like about this name is that "qiita main" isn't really meaningful.

Additionally the meaning of each of these keys should be documented somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you recommend for this? The restrictions under the qiita_main key are restrictions that only apply to our official Qiita main site, but they will not apply on a single user environment...

Re documentation: do you think it's enough to add a comment here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, then the restriction only applies to non-single-user systems right? As for better documenting this, I think we need something more than just a comment in the "constants" file. However I cannot think of a good way to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re documentation: I will add a comment here and I've created a new issue: #1111

I think I used the wrong words on my previous comment, the idea is that those restrictions only apply to our system. Other labs that may want to have a qiita installation locally with multiple users shouldn't have these restrictions either... Is one of those cases in which our system is more specialized than the rest of the users.

Thinking a bit more about this. I think that having this in a configuration file will allow us to change these restrictions dynamically, and we can have them documented there. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, yeah it makes more sense that way. Otherwise it is one of those
seemingly arbitrary restrictions of the system. If these are only for us
then they should be parametrized, although we do want to provide Qiita
as a service, the software should not be specific to our needs in any
way.

On (Apr-27-15|15:12), josenavas wrote:

@@ -6,10 +6,68 @@

The full license is in the file LICENSE, distributed with this software.

-----------------------------------------------------------------------------

+from collections import namedtuple
+from future.utils import viewkeys, viewvalues
+
+Restriction = namedtuple('Restriction', ['columns', 'error_msg'])
+
+# A dict containing the restrictions that apply to the sample templates
+SAMPLE_TEMPLATE_COLUMNS = {

  • 'EBI': Restriction(columns={'collection_timestamp': 'timestamp',
  •                            'physical_specimen_location': 'varchar'},
    
  •                   error_msg="EBI submission disabled"),
    
  • 'Qiita_main': Restriction(columns={'sample_type': 'varchar',

Re documentation: I will add a comment here and I've created a new issue: #1111

I think I used the wrong words on my previous comment, the idea is that those restrictions only apply to our system. Other labs that may want to have a qiita installation locally with multiple users shouldn't have these restrictions either... Is one of those cases in which our system is more specialized than the rest of the users.

Thinking a bit more about this. I think that having this in a configuration file will allow us to change these restrictions dynamically, and we can have them documented there. What do you think?


Reply to this email directly or view it on GitHub:
https://github.com/biocore/qiita/pull/1075/files#r29197135

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree the idea of making this parametrized. Also, we don't know
when EBI decides to change their requirements. However, in interests of
time, I think we should create an issue and I can solve it as a part of
Alpha 0.2, since that will require some refactor. Does that sound good to
you?

2015-04-27 15:24 GMT-07:00 Yoshiki Vázquez Baeza notifications@github.com:

In qiita_db/metadata_template/constants.py
#1075 (comment):

@@ -6,10 +6,68 @@

The full license is in the file LICENSE, distributed with this software.

-----------------------------------------------------------------------------

+from collections import namedtuple
+from future.utils import viewkeys, viewvalues
+
+Restriction = namedtuple('Restriction', ['columns', 'error_msg'])
+
+# A dict containing the restrictions that apply to the sample templates
+SAMPLE_TEMPLATE_COLUMNS = {

  • 'EBI': Restriction(columns={'collection_timestamp': 'timestamp',
  •                            'physical_specimen_location': 'varchar'},
    
  •                   error_msg="EBI submission disabled"),
    
  • 'Qiita_main': Restriction(columns={'sample_type': 'varchar',

Got it, yeah it makes more sense that way. Otherwise it is one of those
seemingly arbitrary restrictions of the system. If these are only for us
then they should be parametrized, although we do want to provide Qiita as a
service, the software should not be specific to our needs in any way.
… <#14cfcfaf015149db_>
On (Apr-27-15|15:12), josenavas wrote: > @@ -6,10 +6,68 @@ > # The full

license is in the file LICENSE, distributed with this software. >

+from collections import namedtuple > +from future.utils import
viewkeys, viewvalues > + > +Restriction = namedtuple('Restriction',
['columns', 'error_msg']) > + > +# A dict containing the restrictions that
apply to the sample templates > +SAMPLE_TEMPLATE_COLUMNS = { > + 'EBI':
Restriction(columns={'collection_timestamp': 'timestamp', > +
'physical_specimen_location': 'varchar'}, > + error_msg="EBI submission
disabled"), > + 'Qiita_main': Restriction(columns={'sample_type':
'varchar', Re documentation: I will add a comment here and I've created a
new issue: #1111 I think I used
the wrong words on my previous comment, the idea is that those restrictions
only apply to our system. Other labs that may want to have a qiita
installation locally with multiple users shouldn't have these restrictions
either... Is one of those cases in which our system is more specialized
than the rest of the users. Thinking a bit more about this. I think that
having this in a configuration file will allow us to change these
restrictions dynamically, and we can have them documented there. What do
you think? --- Reply to this email directly or view it on GitHub:
https://github.com/biocore/qiita/pull/1075/files#r29197135


Reply to this email directly or view it on GitHub
https://github.com/biocore/qiita/pull/1075/files#r29197966.

Jose Navas

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this sounds good we should resolve this before too long though.
Can you open an issue about this?

On (Apr-27-15|15:26), josenavas wrote:

@@ -6,10 +6,68 @@

The full license is in the file LICENSE, distributed with this software.

-----------------------------------------------------------------------------

+from collections import namedtuple
+from future.utils import viewkeys, viewvalues
+
+Restriction = namedtuple('Restriction', ['columns', 'error_msg'])
+
+# A dict containing the restrictions that apply to the sample templates
+SAMPLE_TEMPLATE_COLUMNS = {

  • 'EBI': Restriction(columns={'collection_timestamp': 'timestamp',
  •                            'physical_specimen_location': 'varchar'},
    
  •                   error_msg="EBI submission disabled"),
    
  • 'Qiita_main': Restriction(columns={'sample_type': 'varchar',

Yeah, I agree the idea of making this parametrized. Also, we don't know
when EBI decides to change their requirements. However, in interests of
time, I think we should create an issue and I can solve it as a part of
Alpha 0.2, since that will require some refactor. Does that sound good to
you?

2015-04-27 15:24 GMT-07:00 Yoshiki Vázquez Baeza notifications@github.com:

In qiita_db/metadata_template/constants.py
#1075 (comment):

@@ -6,10 +6,68 @@

The full license is in the file LICENSE, distributed with this software.

-----------------------------------------------------------------------------

+from collections import namedtuple
+from future.utils import viewkeys, viewvalues
+
+Restriction = namedtuple('Restriction', ['columns', 'error_msg'])
+
+# A dict containing the restrictions that apply to the sample templates
+SAMPLE_TEMPLATE_COLUMNS = {

  • 'EBI': Restriction(columns={'collection_timestamp': 'timestamp',
  •                            'physical_specimen_location': 'varchar'},
    
  •                   error_msg="EBI submission disabled"),
    
  • 'Qiita_main': Restriction(columns={'sample_type': 'varchar',

Got it, yeah it makes more sense that way. Otherwise it is one of those
seemingly arbitrary restrictions of the system. If these are only for us
then they should be parametrized, although we do want to provide Qiita as a
service, the software should not be specific to our needs in any way.
… <#14cfcfaf015149db_>
On (Apr-27-15|15:12), josenavas wrote: > @@ -6,10 +6,68 @@ > # The full

license is in the file LICENSE, distributed with this software. >

+from collections import namedtuple > +from future.utils import
viewkeys, viewvalues > + > +Restriction = namedtuple('Restriction',
['columns', 'error_msg']) > + > +# A dict containing the restrictions that
apply to the sample templates > +SAMPLE_TEMPLATE_COLUMNS = { > + 'EBI':
Restriction(columns={'collection_timestamp': 'timestamp', > +
'physical_specimen_location': 'varchar'}, > + error_msg="EBI submission
disabled"), > + 'Qiita_main': Restriction(columns={'sample_type':
'varchar', Re documentation: I will add a comment here and I've created a
new issue: #1111 I think I used
the wrong words on my previous comment, the idea is that those restrictions
only apply to our system. Other labs that may want to have a qiita
installation locally with multiple users shouldn't have these restrictions
either... Is one of those cases in which our system is more specialized
than the rest of the users. Thinking a bit more about this. I think that
having this in a configuration file will allow us to change these
restrictions dynamically, and we can have them documented there. What do
you think? --- Reply to this email directly or view it on GitHub:
https://github.com/biocore/qiita/pull/1075/files#r29197135


Reply to this email directly or view it on GitHub
https://github.com/biocore/qiita/pull/1075/files#r29197966.

Jose Navas


Reply to this email directly or view it on GitHub:
https://github.com/biocore/qiita/pull/1075/files#r29198175

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done #1113

2015-04-27 15:32 GMT-07:00 Yoshiki Vázquez Baeza notifications@github.com:

In qiita_db/metadata_template/constants.py
#1075 (comment):

@@ -6,10 +6,68 @@

The full license is in the file LICENSE, distributed with this software.

-----------------------------------------------------------------------------

+from collections import namedtuple
+from future.utils import viewkeys, viewvalues
+
+Restriction = namedtuple('Restriction', ['columns', 'error_msg'])
+
+# A dict containing the restrictions that apply to the sample templates
+SAMPLE_TEMPLATE_COLUMNS = {

  • 'EBI': Restriction(columns={'collection_timestamp': 'timestamp',
  •                            'physical_specimen_location': 'varchar'},
    
  •                   error_msg="EBI submission disabled"),
    
  • 'Qiita_main': Restriction(columns={'sample_type': 'varchar',

Yeah, this sounds good we should resolve this before too long though. Can
you open an issue about this?
… <#14cfd0256077b3f3_>
On (Apr-27-15|15:26), josenavas wrote: > @@ -6,10 +6,68 @@ > # The full

license is in the file LICENSE, distributed with this software. >

+from collections import namedtuple > +from future.utils import
viewkeys, viewvalues > + > +Restriction = namedtuple('Restriction',
['columns', 'error_msg']) > + > +# A dict containing the restrictions that
apply to the sample templates > +SAMPLE_TEMPLATE_COLUMNS = { > + 'EBI':
Restriction(columns={'collection_timestamp': 'timestamp', > +
'physical_specimen_location': 'varchar'}, > + error_msg="EBI submission
disabled"), > + 'Qiita_main': Restriction(columns={'sample_type':
'varchar', Yeah, I agree the idea of making this parametrized. Also, we
don't know when EBI decides to change their requirements. However, in
interests of time, I think we should create an issue and I can solve it as
a part of Alpha 0.2, since that will require some refactor. Does that sound
good to you? 2015-04-27 15:24 GMT-07:00 Yoshiki Vázquez Baeza <
notifications@github.com>: > In qiita_db/metadata_template/constants.py >
#1075 (comment): > > >
@@ -6,10 +6,68 @@ > > # The full license is in the file LICENSE,

distributed with this software. > >

+from collections import namedtuple > > +from future.utils import
viewkeys, viewvalues > > + > > +Restriction = namedtuple('Restriction',
['columns', 'error_msg']) > > + > > +# A dict containing the restrictions
that apply to the sample templates > > +SAMPLE_TEMPLATE_COLUMNS = { > > +
'EBI': Restriction(columns={'collection_timestamp': 'timestamp', > > +
'physical_specimen_location': 'varchar'}, > > + error_msg="EBI submission
disabled"), > > + 'Qiita_main': Restriction(columns={'sample_type':
'varchar', > > Got it, yeah it makes more sense that way. Otherwise it is
one of those > seemingly arbitrary restrictions of the system. If these are
only for us > then they should be parametrized, although we do want to
provide Qiita as a > service, the software should not be specific to our
needs in any way. > … <#14cfcfaf015149db_> > On (Apr-27-15|15:12),
josenavas wrote: > @@ -6,10 +6,68 @@ > # The full > license is in the file

LICENSE, distributed with this software. > # >

+from collections import namedtuple > +from future.utils import >
viewkeys, viewvalues > + > +Restriction = namedtuple('Restriction', >
['columns', 'error_msg']) > + > +# A dict containing the restrictions that
apply to the sample templates > +SAMPLE_TEMPLATE_COLUMNS = { > + 'EBI': >
Restriction(columns={'collection_timestamp': 'timestamp', > + >
'physical_specimen_location': 'varchar'}, > + error_msg="EBI submission >
disabled"), > + 'Qiita_main': Restriction(columns={'sample_type': >
'varchar', Re documentation: I will add a comment here and I've created a >
new issue: #1111 I think I used >
the wrong words on my previous comment, the idea is that those restrictions
only apply to our system. Other labs that may want to have a qiita >
installation locally with multiple users shouldn't have these restrictions
either... Is one of those cases in which our system is more specialized >
than the rest of the users. Thinking a bit more about this. I think that >
having this in a configuration file will allow us to change these >
restrictions dynamically, and we can have them documented there. What do >
you think? --- Reply to this email directly or view it on GitHub: >
https://github.com/biocore/qiita/pull/1075/files#r29197135 > > — > Reply
to this email directly or view it on GitHub > <
https://github.com/biocore/qiita/pull/1075/files#r29197966>. > -- Jose
Navas --- Reply to this email directly or view it on GitHub:
https://github.com/biocore/qiita/pull/1075/files#r29198175


Reply to this email directly or view it on GitHub
https://github.com/biocore/qiita/pull/1075/files#r29198597.

Jose Navas

@ElDeveloper
Copy link
Contributor

Can you pull from upstream?

@josenavas
Copy link
Contributor Author

Done!

@ElDeveloper
Copy link
Contributor

@josenavas, from looking at the tests in Travis, it seems like there's a test case that's raising a warning that's not being catched, this might or not be intentional, but would be good to update the tests so there's no warnings being raised, see this for example:

/home/travis/build/biocore/qiita/qiita_db/metadata_template/base_metadata_template.py:583: QiitaDBWarning: Some functionality will be disabled due to missing columns:
    EBI submission disabled: primer.
Check https://github.com/biocore/qiita/wiki/Preparing-Qiita-template-files for a description of these fields.
  QiitaDBWarning)

@josenavas
Copy link
Contributor Author

Ok, I'll update the tests so no warning is raised. There are quite a few instances of those... will ASAP.


Returns
-------
md_template : DataFrame
Cleaned copy of the input md_template

Raises
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the warning being raised when there are missing columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I was unsure if the warnings were documented in the same way :-)

@antgonza
Copy link
Member

After these changes go in, do we need to update ftp://ftp.microbio.me/pub/qiita/sample_prep_template_examples.tgz ?

# Remove the sample_id and _id_column columns
db_cols.remove('sample_id')
db_cols.remove(cls._id_column)
headers = sorted(md_template.keys().tolist())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we no longer have to check the column names are unique? Or is this guaranteed by .keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point is it assumed that the md_template has been validated and clean (I think it's a safe assumption since this is a private function). Those checks are done in _clean_validate_template. That sounds reasonable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks!

@josenavas
Copy link
Contributor Author

@antgonza we probably will need to update those, good call. I will take a note to update those once the entire branch is ready to be merged in master.

I just found a small bug on the code and I'm fixing it. However, it requires modifying multiple tests. Will ping when I update everything (and including the comments made so far).

error_msg="Processed data approval disabled")
}

# A dict containing the restrictions that apply to the sample templates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample -> prep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@josenavas
Copy link
Contributor Author

@antgonza @ElDeveloper this should be ready for another review, thanks guys!

if warning_msg:
warnings.warn(
"Some functionality will be disabled due to missing "
"columns:\n\t%s.\nCheck https://github.com/biocore/qiita/wiki"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a link so the user can access it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should I make it a link? Note that this error is raised in qiita_db. I'm hesitant to add specific html here since this warning can be seen from the CLI...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking about this, but it's going to be tricky since this
can also be raised from a terminal and you would get an HTML formatted
error message, which is not appropriate. Ideas?

On (Apr-27-15|15:22), Antonio Gonzalez wrote:

  •    # Retrieve the headers of the metadata template
    
  •    headers = list(md_template.keys())
    
  •    if warning_msg:
    
  •        warnings.warn(
    
  •            "Some functionality will be disabled due to missing "
    
  •            "columns:\n\t%s.\nCheck https://github.com/biocore/qiita/wiki"
    

Should this be a link so the user can access it?


Reply to this email directly or view it on GitHub:
https://github.com/biocore/qiita/pull/1075/files#r29197809

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue with CLI? I think is fine ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ElDeveloper you're the CLI guru, what do you think? I prefer not to add HTML code in the CLI error/warning output, but if more people agrees I'm happy to add it...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a separate conversation/issue. BTW IMOO the parsing should be in python/templates and not in JS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works!

On (Apr-27-15|16:29), Antonio Gonzalez wrote:

  •    # Retrieve the headers of the metadata template
    
  •    headers = list(md_template.keys())
    
  •    if warning_msg:
    
  •        warnings.warn(
    
  •            "Some functionality will be disabled due to missing "
    
  •            "columns:\n\t%s.\nCheck https://github.com/biocore/qiita/wiki"
    

I think this should be a separate conversation/issue. BTW IMOO the parsing should be in python/templates and not in JS.


Reply to this email directly or view it on GitHub:
https://github.com/biocore/qiita/pull/1075/files#r29202450

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that means that this should stay as it is and we'll improve the qiita_pet error parsing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, can you open an issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Done: #1115

2015-04-27 17:04 GMT-07:00 Antonio Gonzalez notifications@github.com:

In qiita_db/metadata_template/base_metadata_template.py
#1075 (comment):

  •    # Retrieve the headers of the metadata template
    
  •    headers = list(md_template.keys())
    
  •    if warning_msg:
    
  •        warnings.warn(
    
  •            "Some functionality will be disabled due to missing "
    
  •            "columns:\n\t%s.\nCheck https://github.com/biocore/qiita/wiki"
    

yup, can you open an issue?


Reply to this email directly or view it on GitHub
https://github.com/biocore/qiita/pull/1075/files#r29204247.

Jose Navas

@ElDeveloper
Copy link
Contributor

All my comments have been addressed, I only have a question, but that's not a blocking comment. @antgonza can you merge once your comments/questions are addressed?

@josenavas
Copy link
Contributor Author

@antgonza can you merge if there are no outstanding issues?

@josenavas
Copy link
Contributor Author

This is ready to go if there are no more comments!

@ElDeveloper
Copy link
Contributor

LGTM.

antgonza added a commit that referenced this pull request Apr 28, 2015
@antgonza antgonza merged commit 79256ff into qiita-spots:relax-md-req Apr 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants