Skip to content

List proc data in search #1039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
break up functions and add tests
  • Loading branch information
squirrelo committed Apr 9, 2015
commit 917c003de720f14cf581c500316d8286b76d2576
94 changes: 53 additions & 41 deletions qiita_pet/handlers/study_handlers/listing_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,41 @@ def _get_shared_links_for_study(study):
return ", ".join(shared)


def _build_single_study_info(study, info):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This auxuliary functions are missing the docstring. I think that putting the docstring is what will make the return structure more understandable, w/o the necessity to go over the code... (that's also what I meant by breaking the functions...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# Clean up and add to the study info for HTML purposes
PI = StudyPerson(info['principal_investigator_id'])
status = study.status
if info['pmid'] is not None:
info['pmid'] = ", ".join([pubmed_linkifier([p])
for p in info['pmid']])
else:
info['pmid'] = ""
if info["number_samples_collected"] is None:
info["number_samples_collected"] = 0
info["shared"] = _get_shared_links_for_study(study)
info["num_raw_data"] = len(study.raw_data())
info["status"] = status
info["study_id"] = study.id
info["pi"] = study_person_linkifier((PI.email, PI.name))
del info["principal_investigator_id"]
del info["email"]
return info


def _build_proc_data_info(study, study_proc, proc_samples):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMOO, this function should follow the structure of the previous one. It should be _build_single_proc_data_info and then you call it in a list comprehension from _build_single_study_info. You will have then a function for a single object, and the list comprehension will be cheaper than the appends.

# Build the proc data info list for the child row in datatable
proc_data_info = []
for pid in study_proc[study.id]:
proc_data = ProcessedData(pid)
proc_info = proc_data.processing_info
proc_info['pid'] = pid
proc_info['data_type'] = proc_data.data_type()
proc_info['samples'] = sorted(proc_samples[pid])
proc_info['processed_date'] = str(proc_info['processed_date'])
proc_data_info.append(proc_info)
return proc_data_info


def _build_study_info(user, study_proc=None, proc_samples=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the code on this function looks good, I had a hard time understanding completely what the function is doing. I think it will improve the readability and maintenance of the code if parts of this function are encapsulated in their own functions. For example, the construction of the proc_data_info elements can be encapsulated into their own function. This also makes it easier to test.

Also, IMOO the html introduced here is hard to read. Is it possible to move it to a global so it becomes easier to read (less problems with PEP8)...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored and now renders the gnarly stuff client-side usingt he datatables module itself.

"""builds list of dicts for studies table, with all html formatted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

builds -> Builds and html -> HTML


Expand All @@ -61,21 +96,22 @@ def _build_study_info(user, study_proc=None, proc_samples=None):
-------
infolist: list of dict of lists and dicts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

o.O possible to have a format description, this is really convoluted... Something like:[{study_id: [{foo:bar}]}]. I think it will help to explain what is returned here...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit more on how this looks like and reading a bit further the code, I think you will take advantage of using namedtuples. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. How do those work when JSON serialized?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, probably worth give it a shot and try different things. The code as it is right now is really hard to follow and I don't think that is maintainable.
On the other hand, are you forced to use JSON? The reason why I'm asking is because it looks like you're using all this information in a javascript function, so you probably can just pass the python object and use the template engine to fill those values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be JSON because all this is sent over AJAX to the datatable, which is what allows the table to be dynamic for each metadata search.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it looks like serializing namedtuples to json is a bit tricky http://stackoverflow.com/q/5906831/3746629. Giving this, I think that breaking the function in multiple functions (as I suggested before) and introducing a highly detailed documentation, will help to make this code maintainable.

IMOO, this function should only create the list. The outer dictionary should be created by another function, and the inner dictionary by another function. This will put a logical break up in the code that will be easier to follow, as well as easier testable, since the error (if any) will be in 1 of these 3 functions rather than in a single one as it is right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

study and processed data info for JSON serialiation for datatables
Each dict in the list is a single study, and contains the text

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing the return description.

Notes
-----
Both study_proc and proc_samples must be passed, or neither passed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest also adding a comment on the description of each parameter: must be provided if 'proc_samples' is provided; or something along those lines

"""
build_samples = False
# Logic check to make sure both needed parts passed
build_info = False
if study_proc is not None and proc_samples is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's no tests for these exception cases. Would you mind adding those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

raise IncompetentQiitaDeveloperError(
'Must pass proc_samples when study_proc given')
elif proc_samples is not None and study_proc is None:
raise IncompetentQiitaDeveloperError(
'Must pass study_proc when proc_samples given')
elif proc_samples is None:
build_info = True
elif study_proc is None:
build_samples = True

# get list of studies for table
study_set = user.user_studies.union(
Expand All @@ -93,48 +129,24 @@ def _build_study_info(user, study_proc=None, proc_samples=None):
study_info = Study.get_info(study_set, cols)

infolist = []
for row, info in enumerate(study_info):
for info in study_info:
# Convert DictCursor to proper dict
info = dict(info)
study = Study(info['study_id'])
PI = StudyPerson(info['principal_investigator_id'])
status = study.status
# if needed, get all the proc data info since no search results passed
if build_info:
proc_data = study.processed_data()
proc_samples = {}
study_proc = {study.id: proc_data}
for pid in proc_data:
proc_samples[pid] = ProcessedData(pid).samples

# Clean up and add to the study info for HTML purposes
if info['pmid'] is not None:
info['pmid'] = ", ".join([pubmed_linkifier([p])
for p in info['pmid']])
else:
info['pmid'] = ""
if info["number_samples_collected"] is None:
info["number_samples_collected"] = 0
info["shared"] = _get_shared_links_for_study(study)
info["num_raw_data"] = len(study.raw_data())
info["status"] = status
info["study_id"] = study.id
info["pi"] = study_person_linkifier((PI.email, PI.name))
del info["principal_investigator_id"]
del info["email"]
# Build the processed data info for the study if none passed
if build_samples:
proc_data = study.processed_data()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation in this block has a few extra spaces.

proc_samples = {}
study_proc = {study.id: proc_data}
for pid in proc_data:
proc_samples[pid] = ProcessedData(pid).samples

study_info = _build_single_study_info(study, info)
# Build the proc data info list for the child row in datatable
proc_data_info = []
for pid in study_proc[study.id]:
proc_data = ProcessedData(pid)
proc_info = proc_data.processing_info
proc_info['pid'] = pid
proc_info['data_type'] = proc_data.data_type()
proc_info['samples'] = sorted(proc_samples[pid])
proc_info['processed_date'] = str(proc_info['processed_date'])
proc_data_info.append(proc_info)
info["proc_data_info"] = proc_data_info

infolist.append(info)
study_info["proc_data_info"] = _build_proc_data_info(study, study_proc,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is weird that this is done here. If "proc_data_info" is part of the study info (which it is) it should be created on the _build_single_study_info. Just move that call to the end of that function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make a clear separation between proc data and study info. It also makes it so I don't have to pass as many things into the single_study_info function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that there is no separation. The "proc_data_info" is added
to the "study_info" dictionary, which means that the proc_data_info is part
of the study_info. That is why I think it has to be added to the
"_build_sinfle_study_info", because that is info that belongs to the study.
You're just building it ad-hoc inside the other function, instead of
building it in 2 different places. My point is that if at some point we
need the "_build_single_study_info" in another part of the code, we will
need to remember to call "_build_proc_data_info" independently, which shows
a deficiency on the code structure design.

2015-04-12 16:48 GMT-07:00 Joshua Shorenstein notifications@github.com:

In qiita_pet/handlers/study_handlers/listing_handlers.py
#1039 (comment):

  •        "pmid": pmids,
    
  •        "status": status,
    

- "abstract": info["study_abstract"]

  •    })
    
  •    # Build the processed data info for the study if none passed
    
  •    if build_samples:
    
  •            proc_data = study.processed_data()
    
  •            proc_samples = {}
    
  •            study_proc = {study.id: proc_data}
    
  •            for pid in proc_data:
    
  •                proc_samples[pid] = ProcessedData(pid).samples
    
  •    study_info = _build_single_study_info(study, info)
    
  •    # Build the proc data info list for the child row in datatable
    
  •    study_info["proc_data_info"] = _build_proc_data_info(study, study_proc,
    

This is to make a clear separation between proc data and study info. It
also makes it so I don't have to pass as many things into the
single_study_info function.


Reply to this email directly or view it on GitHub
https://github.com/biocore/qiita/pull/1039/files#r28208973.

Jose Navas

proc_samples)

infolist.append(study_info)
return infolist


Expand Down
90 changes: 89 additions & 1 deletion qiita_pet/test/test_study_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,65 @@
from qiita_db.util import get_count, check_count
from qiita_db.user import User
from qiita_pet.handlers.study_handlers.listing_handlers import (
_get_shared_links_for_study, _build_study_info)
_get_shared_links_for_study, _build_study_info, _build_single_study_info,
_build_proc_data_info)


class TestHelpers(TestHandlerBase):
database = True

def setUp(self):
self.single_exp = {
'study_id': 1,
'status': 'private',
'study_abstract':
'This is a preliminary study to examine the microbiota '
'associated with the Cannabis plant. Soils samples '
'from the bulk soil, soil associated with the roots, '
'and the rhizosphere were extracted and the DNA '
'sequenced. Roots from three independent plants of '
'different strains were examined. These roots were '
'obtained November 11, 2011 from plants that had been '
'harvested in the summer. Future studies will attempt '
'to analyze the soils and rhizospheres from the same '
'location at different time points in the plant '
'lifecycle.',
'metadata_complete': True,
'study_title':
'Identification of the Microbiomes for Cannabis Soils',
'num_raw_data': 4,
'number_samples_collected': 27,
'shared':
'<a target="_blank" href="mailto:shared@foo.bar">Shared</a>',
'pmid': '<a target="_blank" href="http://www.ncbi.nlm.nih.gov/'
'pubmed/123456">123456</a>, <a target="_blank" href="http:'
'//www.ncbi.nlm.nih.gov/pubmed/7891011">7891011</a>',
'pi': '<a target="_blank" href="mailto:PI_dude@foo.bar">'
'PIDude</a>'
}
self.proc_data_exp = [{
'pid': 1,
'processed_date': '2012-10-01 09:30:27',
'data_type': '18S',
'algorithm': 'uclust',
'reference_name': 'Greengenes',
'reference_version': '13_8',
'taxonomy_filepath': 'GreenGenes_13_8_97_otu_taxonomy.txt',
'sequence_filepath': 'GreenGenes_13_8_97_otus.fasta',
'tree_filepath': 'GreenGenes_13_8_97_otus.tree',
'similarity': 0.97,
'enable_rev_strand_match': True,
'suppress_new_clusters': True,
'samples': ['1.SKB1.640202', '1.SKB2.640194', '1.SKB3.640195',
'1.SKB4.640189', '1.SKB5.640181', '1.SKB6.640176',
'1.SKB7.640196', '1.SKB8.640193', '1.SKB9.640200',
'1.SKD1.640179', '1.SKD2.640178', '1.SKD3.640198',
'1.SKD4.640185', '1.SKD5.640186', '1.SKD6.640190',
'1.SKD7.640191', '1.SKD8.640184', '1.SKD9.640182',
'1.SKM1.640183', '1.SKM2.640199', '1.SKM3.640197',
'1.SKM4.640180', '1.SKM5.640177', '1.SKM6.640187',
'1.SKM7.640188', '1.SKM8.640201', '1.SKM9.640192']
}]
self.exp = [{
'study_id': 1,
'status': 'public',
Expand Down Expand Up @@ -72,6 +124,42 @@ def test_get_shared_links_for_study(self):
exp = '<a target="_blank" href="mailto:shared@foo.bar">Shared</a>'
self.assertEqual(obs, exp)

def test_build_single_study_info(self):
study_info = {
'study_id': 1,
'email': 'test@foo.bar',
'principal_investigator_id': 3,
'pmid': ['123456', '7891011'],
'study_title':
'Identification of the Microbiomes for Cannabis Soils',
'metadata_complete': True,
'number_samples_collected': 27,
'study_abstract':
'This is a preliminary study to examine the microbiota '
'associated with the Cannabis plant. Soils samples '
'from the bulk soil, soil associated with the roots, '
'and the rhizosphere were extracted and the DNA '
'sequenced. Roots from three independent plants of '
'different strains were examined. These roots were '
'obtained November 11, 2011 from plants that had been '
'harvested in the summer. Future studies will attempt '
'to analyze the soils and rhizospheres from the same '
'location at different time points in the plant '
'lifecycle.'
}
obs = _build_single_study_info(Study(1), study_info)
for key, val in self.single_exp.iteritems():
if obs[key] != val:
print key
print obs[key], val
self.assertEqual(obs, self.single_exp)

def test_build_proc_data_info(self):
study_proc = {1: [1]}
proc_samples = {1: self.proc_data_exp[0]['samples']}
obs = _build_proc_data_info(Study(1), study_proc, proc_samples)
self.assertEqual(obs, self.proc_data_exp)

def test_build_study_info(self):
ProcessedData(1).status = 'public'
obs = _build_study_info(User('test@foo.bar'))
Expand Down