Skip to content

Conversation

@antgonza
Copy link
Member

@antgonza antgonza commented Sep 3, 2015

Sorry for the large amount of changes. Have no other idea of how to divide nicely, except making the DB changes in a different PR but that's small compared to the rest so decided to leave it here.

General comments:

  • Added scientific name (required for EBI submission) to the populate_test_db.sql so we can test
  • Remember that commands.submit_EBI is a work in progress and is just so we have an idea of how the calls will look
  • Removed iter_file_via_list_of_dicts, add_sample, add_sample_prep and add_samples_from_templates as these were required for from-file submission, which we will not support anymore as everything is being loaded from the DB
  • Removed _generate_library_descriptor as this was using/forcing to have some library design fields that are not required anymore
  • Removed all write_*_xml as now we simply call the write/generate functions
  • Removed write_all_xml_files cause this will defeat the idea of having networkx call/submit jobs. See point 2 of this list.
  • Replaced gzopen -> GzipFile cause this will allow us to set mtime (float to calculate gzip's timestamp) for testing. FYI: gzip incorporates the timestamp in the file so we need to know/define the timestamp for testing.
  • Removed all qiita_ware/test/test_data/* as now the files are generated and deleted on the fly.

Who wants to review?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of deleting this lines, can you just check that is returning an empty set? I think this was the original idea of this test but got lost when scientific_name become required. This way we will have a test for a template that is not missing anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not accurate because you're actually checking the data type on line 153.

Anyways, I don't think the logic here is correct. In here we should be checking the columns required by EBI. In the current code, you're checking all the required columns, for multiple reasons - however I think that the correct thing to do is checking only the EBI columns.

Also, you don't need to actually check the data type - the templates have a property that will return the columns that are required for their datatype:

In [11]: st = SampleTemplate(1)

In [12]: st.columns_restrictions
Out[12]: 
{'EBI': Restriction(columns={'physical_specimen_location': 'varchar', 'collection_timestamp': 'timestamp', 'scientific_name': 'varchar', 'taxon_id': 'integer'}, error_msg='EBI submission disabled'),
 'qiita_main': Restriction(columns={'description': 'varchar', 'physical_specimen_remaining': 'bool', 'dna_extracted': 'bool', 'longitude': 'float8', 'sample_type': 'varchar', 'latitude': 'float8', 'host_subject_id': 'varchar'}, error_msg='Processed data approval disabled')}

In [13]: pt = PrepTemplate(1)

In [14]: pt.columns_restrictions
Out[14]: 
{'EBI': Restriction(columns={'center_name': 'varchar', 'platform': 'varchar', 'primer': 'varchar', 'experiment_design_description': 'varchar', 'library_construction_protocol': 'varchar'}, error_msg='EBI submission disabled'),
 'demultiplex': Restriction(columns={'primer': 'varchar', 'barcode': 'varchar'}, error_msg='Demultiplexing disabled. You will not be able to preprocess your raw data'),
 'demultiplex_multiple': Restriction(columns={'primer': 'varchar', 'barcode': 'varchar', 'run_prefix': 'varchar'}, error_msg='Demultiplexing with multiple input files disabled. If your raw data includes multiple raw input files, you will not be able to preprocess your raw data')}

Given this, I think the correct path here is:

restrictions = self.sample_template.columns_restrictions['EBI'].columns.keys()
restrictions.extend(self.prep_template.columns_restrictions['EBI'].columns.keys()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong feeling about this just thought it was easier to remove than add. If nobody opposes I will only use EBI restrictions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thinking a bit more about this, you may be able to take advantage of the check_restrictions function from the templates:

pt_missing = self.prep_template.check_restrictions([self.prep_template.column_restrictions['EBI'])
st_missing = self.sample_template.check_restrictions([self.sample_template.column_restrictions['EBI'])

And then let the user know where are the columns missing (i.e. not only telling them which columns are missing but where):

if st_missing:
    # add error_msg for sample template
if pt_missing:
    # add error_msg for prep tempalte

@josenavas
Copy link
Contributor

Thanks for the titanic effort here @antgonza ! I have done one pass and I think your errors will go away after addressing some of my comments

@ElDeveloper
Copy link
Contributor

@antgonza, I think I'm done with my review. Looks good so far, but it would definitely be helpful if @adamrp reviewed this as well, there's a lot going on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this isn't a private function would it be worthwhile to add the Parameters section in the numpydocs? (To clarify that s is a str)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you didn't modify this, but can you reword this sentence so it fits in a single line in order to follow the numpy standard?

@mortonjt
Copy link
Contributor

mortonjt commented Sep 6, 2015

I did notice that in ebi.py is missing proper doc strings for send_sequences() and send_xml(). Do you guys mind if I fix this in a future PR?

@antgonza
Copy link
Member Author

antgonza commented Sep 8, 2015

@mortonjt thanks for offering but those will be added in step 4 of this branch.

@antgonza
Copy link
Member Author

antgonza commented Sep 8, 2015

Addressed all issues and questions. Can someone do a final review/merge?

@josenavas
Copy link
Contributor

Sure, going over it again!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like my previous comment here was a bit confusing, sorry about that. The scientific_name should be 'root metagenome'.

'1118232' is the taxon_id (which is already present in the template).

IMO, not blocking - but would like to hear from @ElDeveloper

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking at all.

@josenavas
Copy link
Contributor

thanks @antgonza! few more comments.

I've added a couple of concerns to the joy list, as I think that this PR is getting to big and they can be fixed later and/or the final code will change once your work continues moving forward. In short, the issues that I'm concerned are:

  • UTF-8 char errors (add test)
  • Add a test in which the prep template has less samples than the sample template
  • EBISubmission.__init__ is getting long and complex, so probably we want to revisit it. I can see this function changing once we move all the steps so I don't want to discuss it here.

@antgonza
Copy link
Member Author

antgonza commented Sep 8, 2015

Thanks for adding. The other issues (not on the list) have been addressed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, the date_to_hold parameter should be renamed. To me, this name implies that the release should be held until today.

Also, this should default to None, then in the body of the function it should be set to date.today() (or today + 365 days) if it is None. Otherwise, this will not behave as you expect it to. As it is now, date.today() will be called only once, when the def is first executed (so, only once each time Qiita is launched, I assume), and the date_to_hold will be the same for all submissions, regardless of the actual submission date.

@adamrp
Copy link
Contributor

adamrp commented Sep 8, 2015

A few comments on the code, but it looks good to me. It's great how much code can be deleted now that we are not relying at all on files and can instead use the data model! 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be dashes (-) and not underscores (_) for numpydoc

@antgonza
Copy link
Member Author

antgonza commented Sep 9, 2015

Ping! @adamrp @josenavas @mortonjt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really minor but EB.Submission.valid_platforms -> EBISubmission.valid_platforms

josenavas added a commit that referenced this pull request Sep 9, 2015
@josenavas josenavas merged commit d0ae1c4 into qiita-spots:ebi-improvements Sep 9, 2015
@josenavas
Copy link
Contributor

Las commit was modifying only documentation - no reason to wait for the tests 😄 thanks @antgonza !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants