Skip to content

Issue 855 #900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 49 commits into from
Mar 27, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
be2186b
Modifying the database
josenavas Mar 2, 2015
5188ef0
Modifying populate_test_db.sql to support the new db schema
josenavas Mar 2, 2015
905b14b
Fixing column naming
josenavas Mar 3, 2015
93a8760
Fixing study object
josenavas Mar 3, 2015
6c3d5a0
Fixing the ProcessedData object
josenavas Mar 3, 2015
60e5276
Fixing the tests for base.py
josenavas Mar 3, 2015
2d7fff4
Fixing more tests
josenavas Mar 3, 2015
5db7e9e
Fixing all qiita_db tests
josenavas Mar 3, 2015
90da22c
All tests passing again
josenavas Mar 3, 2015
2db9c91
Solving merge conflicts
josenavas Mar 4, 2015
5e7d687
Moving status changer buttons from the study to the processed data
josenavas Mar 6, 2015
09c7586
Easy status on processed data
josenavas Mar 6, 2015
688c250
Adding status to preprocessed data
josenavas Mar 6, 2015
d9c73ca
Easy status on preprocessed data
josenavas Mar 6, 2015
ef7b3ec
Adding status to prep template
josenavas Mar 7, 2015
ba42a92
Easy status on prep template
josenavas Mar 10, 2015
b13055a
Adding status to raw data
josenavas Mar 10, 2015
f32d47b
Easy status on raw data
josenavas Mar 10, 2015
fe9c860
Merge branch 'master' of github.com:qiime/QiiTa into issue-855
josenavas Mar 10, 2015
a9085dd
Fixing test failure
josenavas Mar 10, 2015
f851b8d
Expose only public *data through public study access
josenavas Mar 11, 2015
b849eaf
solving merge conflicts
josenavas Mar 11, 2015
b93d037
Adding get_by_status and get_by_status_group_by_study funcs to proces…
josenavas Mar 17, 2015
591e004
Change the awaiting approval list to retrieve the information by proc…
josenavas Mar 17, 2015
2382017
BUG: the admin user now have access to all data again...
josenavas Mar 17, 2015
fdac7da
Fixing merge conflicts
josenavas Mar 17, 2015
4da919c
Fixing merge conflicts
josenavas Mar 18, 2015
b8b205f
Fixing merge conflicts
josenavas Mar 19, 2015
51696a8
Changing meta_util to reflect status change
josenavas Mar 19, 2015
2c70328
Removing commented code
josenavas Mar 19, 2015
9a5d14f
Update dbschema and html to reflect the current database layout
josenavas Mar 19, 2015
0924257
Fixing bug
josenavas Mar 19, 2015
5de46d2
Pep 8 ...
josenavas Mar 19, 2015
ee95fe4
Fixing commands.sh
josenavas Mar 19, 2015
736ad4e
Fixing commands.sh
josenavas Mar 20, 2015
32a6984
Flake 8 again...
josenavas Mar 20, 2015
bb0cb32
Addressing @squirrelo's comments
josenavas Mar 20, 2015
78b8aea
Merge branch 'master' of github.com:qiime/QiiTa into issue-855
josenavas Mar 23, 2015
d17f9a1
ON to USING
josenavas Mar 25, 2015
8281e3c
Extracting status inference code to its own function to avoid code du…
josenavas Mar 25, 2015
57df5c7
Modify status functions to use the util infer status
josenavas Mar 25, 2015
bb1dd25
Adding test for non-status value
josenavas Mar 25, 2015
f9a9e10
Addressing remaining comments
josenavas Mar 25, 2015
84a9ed4
Addressing @ElDeveloper's comments
josenavas Mar 26, 2015
67d8fb2
Addressing @antgonza's comments
josenavas Mar 26, 2015
a09f97f
Increasing timeout
josenavas Mar 26, 2015
68bc8ae
Checking another version of tornado
josenavas Mar 26, 2015
46b84ef
Trying with docker containers
josenavas Mar 26, 2015
3bca14e
Changing back the tornado version
josenavas Mar 26, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
language: python
sudo: false
env:
global:
- PYTHON_VERSION=2.7
Expand Down
177 changes: 175 additions & 2 deletions qiita_db/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,10 +85,11 @@
from .base import QiitaObject
from .logger import LogEntry
from .sql_connection import SQLConnectionHandler
from .exceptions import QiitaDBError, QiitaDBUnknownIDError
from .exceptions import QiitaDBError, QiitaDBUnknownIDError, QiitaDBStatusError
from .util import (exists_dynamic_table, insert_filepaths, convert_to_id,
convert_from_id, purge_filepaths, get_filepath_id,
get_mountpoint, move_filepaths_to_upload_folder)
get_mountpoint, move_filepaths_to_upload_folder,
infer_status)


class BaseData(QiitaObject):
Expand Down Expand Up @@ -615,6 +616,58 @@ def remove_filepath(self, fp):
# Delete the files, if they are not used anywhere
purge_filepaths(conn_handler)

def status(self, study):
"""The status of the raw data within the given study

Parameters
----------
study : Study
The study that is looking to the raw data status

Returns
-------
str
The status of the raw data

Raises
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to remove this section as there are not specific exceptions raised in this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually forgot to add the Error raised 😲

------
QiitaDBStatusError
If the raw data does not belong to the passed study

Notes
-----
Given that a raw data can be shared by multiple studies, we need to
know under which context (study) we want to check the raw data status.
The rationale is that a raw data object can contain data from multiple
studies, so the raw data can have multiple status at the same time.
We then check the processed data generated to infer the status of the
raw data.
"""
if self._id not in study.raw_data():
raise QiitaDBStatusError(
"The study %s does not have access to the raw data %s"
% (study.id, self.id))

conn_handler = SQLConnectionHandler()
sql = """SELECT processed_data_status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, but these would be cleaner/easier to read if you used USING instead of ON

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I changed all instances that I've added in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, that's some nice shorthand to know! Thanks @squirrelo

FROM qiita.processed_data_status pds
JOIN qiita.processed_data pd
USING (processed_data_status_id)
JOIN qiita.preprocessed_processed_data ppd_pd
USING (processed_data_id)
JOIN qiita.prep_template_preprocessed_data pt_ppd
USING (preprocessed_data_id)
JOIN qiita.prep_template pt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the prep template joins in here? If we just need to get up to the processed data, using raw_data, study_raw_data, and study_preprocessed_data would be easier and faster than prep template.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I have to check the specific processed data that has been generated from this raw data, not any processed data. That path is only available through the prep template.

USING (prep_template_id)
JOIN qiita.raw_data rd
USING (raw_data_id)
JOIN qiita.study_processed_data spd
USING (processed_data_id)
WHERE pt.raw_data_id=%s AND spd.study_id=%s"""
pd_statuses = conn_handler.execute_fetchall(sql, (self._id, study.id))

return infer_status(pd_statuses)


class PreprocessedData(BaseData):
r"""Object for dealing with preprocessed data
Expand Down Expand Up @@ -995,6 +1048,34 @@ def processing_status(self, state):
"UPDATE qiita.{0} SET processing_status=%s WHERE "
"preprocessed_data_id=%s".format(self._table), (state, self.id))

@property
def status(self):
"""The status of the preprocessed data

Returns
-------
str
The status of the preprocessed_data

Notes
-----
The status of a preprocessed data is inferred by the status of the
processed data generated from this preprocessed data. If no processed
data has been generated with this preprocessed data; then the status
is 'sandbox'.
"""
conn_handler = SQLConnectionHandler()
sql = """SELECT processed_data_status
FROM qiita.processed_data_status pds
JOIN qiita.processed_data pd
USING (processed_data_status_id)
JOIN qiita.preprocessed_processed_data ppd_pd
USING (processed_data_id)
WHERE ppd_pd.preprocessed_data_id=%s"""
pd_statuses = conn_handler.execute_fetchall(sql, (self._id,))

return infer_status(pd_statuses)


class ProcessedData(BaseData):
r"""Object for dealing with processed data
Expand All @@ -1020,6 +1101,61 @@ class ProcessedData(BaseData):
_study_processed_table = "study_processed_data"
_preprocessed_processed_table = "preprocessed_processed_data"

@classmethod
def get_by_status(cls, status):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have this for processed data, we should have it for preprocessed and raw data as well to keep things consistent. Anything that has a status really should have this classmethod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this... I can do that, although that functionality has not been required for any part of the code right now; so I figured that it will be easier to open an issue for that and do not block more DB development with this PR: #1009

"""Returns id for all ProcessedData with given status

Parameters
----------
status : str
Status to search for

Returns
-------
list of int
All the processed data ids that match the given status
"""
conn_handler = SQLConnectionHandler()
sql = """SELECT processed_data_id FROM qiita.processed_data pd
JOIN qiita.processed_data_status pds
USING (processed_data_status_id)
WHERE pds.processed_data_status=%s"""
result = conn_handler.execute_fetchall(sql, (status,))
if result:
pds = set(x[0] for x in result)
else:
pds = set()

return pds

@classmethod
def get_by_status_grouped_by_study(cls, status):
"""Returns id for all ProcessedData with given status grouped by study

Parameters
----------
status : str
Status to search for

Returns
-------
dict of list of int
A dictionary keyed by study id in which the values are the
processed data ids that belong to that study and match the given
status
"""
conn_handler = SQLConnectionHandler()
sql = """SELECT spd.study_id,
array_agg(pd.processed_data_id ORDER BY pd.processed_data_id)
FROM qiita.processed_data pd
JOIN qiita.processed_data_status pds
USING (processed_data_status_id)
JOIN qiita.study_processed_data spd
USING (processed_data_id)
WHERE pds.processed_data_status = %s
GROUP BY spd.study_id;"""
return dict(conn_handler.execute_fetchall(sql, (status,)))

@classmethod
def create(cls, processed_params_table, processed_params_id, filepaths,
preprocessed_data=None, study=None, processed_date=None,
Expand Down Expand Up @@ -1173,3 +1309,40 @@ def processed_date(self):
return conn_handler.execute_fetchone(
"SELECT processed_date FROM qiita.{0} WHERE "
"processed_data_id=%s".format(self._table), (self.id,))[0]

@property
def status(self):
conn_handler = SQLConnectionHandler()
sql = """SELECT pds.processed_data_status
FROM qiita.processed_data_status pds
JOIN qiita.processed_data pd
USING (processed_data_status_id)
WHERE pd.processed_data_id=%s"""
return conn_handler.execute_fetchone(sql, (self._id,))[0]

@status.setter
def status(self, status):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no sanity check for the status, so when I tested it with status 'chicken' it errors out. Better to use convert_to_id and let that error. This goes for all the other status setters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And by errors out I mean it says it would violate the NOT NULL constraint of the column.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed, note that this is the only status setter implemented on this PR (as all the other status are inferred from the processed data status)

"""Set the status value

Parameters
----------
status : str
The new status

Raises
------
QiitaDBStatusError
If the processed data status is public
"""
if self.status == 'public':
raise QiitaDBStatusError(
"Illegal operation on public processed data")

conn_handler = SQLConnectionHandler()

status_id = convert_to_id(status, 'processed_data_status',
conn_handler=conn_handler)

sql = """UPDATE qiita.{0} SET processed_data_status_id = %s
WHERE processed_data_id=%s""".format(self._table)
conn_handler.execute(sql, (status_id, self._id))
40 changes: 36 additions & 4 deletions qiita_db/meta_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,8 @@ def get_accessible_filepath_ids(user):
return set(f[0] for f in fpids)

# First, the studies
# There are public, private, and shared studies
study_ids = Study.get_by_status('public') | user.user_studies | \
user.shared_studies
# There are private and shared studies
study_ids = user.user_studies | user.shared_studies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the public from here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, see it below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄


filepath_ids = set()
for study_id in study_ids:
Expand All @@ -107,7 +106,6 @@ def get_accessible_filepath_ids(user):
for rdid in study.raw_data():
for pt_id in RawData(rdid).prep_templates:
# related to https://github.com/biocore/qiita/issues/596
# and https://github.com/biocore/qiita/issues/554
if PrepTemplate.exists(pt_id):
for _id, _ in PrepTemplate(pt_id).get_filepaths():
prep_fp_ids.append(_id)
Expand All @@ -118,6 +116,40 @@ def get_accessible_filepath_ids(user):
in SampleTemplate(study_id).get_filepaths()]
filepath_ids.update(sample_fp_ids)

# Next, the public processed data
processed_data_ids = ProcessedData.get_by_status('public')
for pd_id in processed_data_ids:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems amazingly expensive, especially if this is going to be used to check file access and needs to happen every download. Can you create a PrepTemplate(x).associated_files or some other way of getting these out with far less database calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just following the pattern on this function. I already noticed that this function is quite bad and needs some refactoring, but I think it needs to be done outside of this PR. See #1010

processed_data = ProcessedData(pd_id)

# Add the filepaths of the processed data
pd_fps = (fpid for fpid, _, _ in processed_data.get_filepaths())
filepath_ids.update(pd_fps)

# Each processed data has a preprocessed data
ppd = PreprocessedData(processed_data.preprocessed_data)
ppd_fps = (fpid for fpid, _, _ in ppd.get_filepaths())
filepath_ids.update(ppd_fps)

# Each preprocessed data has a prep template
pt_id = ppd.prep_template
# related to https://github.com/biocore/qiita/issues/596
if PrepTemplate.exists(pt_id):
pt = PrepTemplate(pt_id)
pt_fps = (fpid for fpid, _ in pt.get_filepaths())
filepath_ids.update(pt_fps)

# Each prep template has a raw data
rd = RawData(pt.raw_data)
rd_fps = (fpid for fpid, _, _ in rd.get_filepaths())
filepath_ids.update(rd_fps)

# And each processed data has a study, which has a sample template
st_id = processed_data.study
if SampleTemplate.exists(st_id):
sample_fp_ids = (_id for _id, _
in SampleTemplate(st_id).get_filepaths())
filepath_ids.update(sample_fp_ids)

# Next, analyses
# Same as before, there are public, private, and shared
analysis_ids = Analysis.get_by_status('public') | user.private_analyses | \
Expand Down
32 changes: 31 additions & 1 deletion qiita_db/metadata_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@
from .util import (exists_table, get_table_cols, get_emp_status,
get_required_sample_info_status, convert_to_id,
convert_from_id, get_mountpoint, insert_filepaths,
scrub_data)
scrub_data, infer_status)
from .study import Study
from .data import RawData
from .logger import LogEntry
Expand Down Expand Up @@ -2206,6 +2206,36 @@ def create_qiime_mapping_file(self, prep_template_fp):

return filepath

@property
def status(self):
"""The status of the prep template

Returns
-------
str
The status of the prep template

Notes
-----
The status of a prep template is inferred by the status of the
processed data generated from this prep template. If no processed
data has been generated with this prep template; then the status
is 'sandbox'.
"""
conn_handler = SQLConnectionHandler()
sql = """SELECT processed_data_status
FROM qiita.processed_data_status pds
JOIN qiita.processed_data pd
USING (processed_data_status_id)
JOIN qiita.preprocessed_processed_data ppd_pd
USING (processed_data_id)
JOIN qiita.prep_template_preprocessed_data pt_ppd
USING (preprocessed_data_id)
WHERE pt_ppd.prep_template_id=%s"""
pd_statuses = conn_handler.execute_fetchall(sql, (self._id,))

return infer_status(pd_statuses)


def load_template_to_dataframe(fn, strip_whitespace=True):
"""Load a sample or a prep template into a data frame
Expand Down
Loading