Skip to content

Delete process data #1034

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 3, 2015
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions qiita_db/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -1257,6 +1257,60 @@ def create(cls, processed_params_table, processed_params_id, filepaths,
pd.add_filepaths(filepaths, conn_handler)
return cls(pd_id)

@classmethod
def delete(cls, processed_data_id):
"""Removes the processed data with id processed_data_id

Parameters
----------
processed_data_id : int
The processed data id

Raises
------
QiitaDBUnknownIDError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not raising this error, so no need to list here. In fact, there is no call to exists so probably useful to add.
Also, you're missing the description of the QiitaDBError.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I guess I erased the wrong ones in one of the iterations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs an exists call since line 1276 will do it for you, but I agree about removing the Raises text and adding QiitaDBStatusError instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

;) that's what I was doing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call

If the processed data id doesn't exist
QiitaDBStatusError
If the processed data status is not sandbox
"""
if cls(processed_data_id).status != 'sandbox':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just thinking about this. Should we limit the removal only to sandbox? I think we should be able to remove the processed data in any moment except if it is public (i.e. why we should not allow the removal if it is private?)

Copy link
Member Author

@antgonza antgonza Apr 3, 2015 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fine with me. We can revisit this later and it is an easy change if we decide something different. I'm going to pull down the code to test.

raise QiitaDBStatusError(
"Illegal operation on non sandbox processed data")

conn_handler = SQLConnectionHandler()

analysis_ids = [str(_id[0]) for _id in conn_handler.execute_fetchall(
"SELECT DISTINCT analysis_id FROM qiita.analysis_sample WHERE "
"processed_data_id = {0} ORDER BY "
"analysis_id".format(processed_data_id))]

if analysis_ids:
raise QiitaDBError(
"Processed data %d is linked to (meta)analyses: %s" %
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we improve the error message here?
Processed data %d cannot be removed because it is linked to the following (meta)analysis: %s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analysis names would be better than IDs. There is no way to link an ID to an analysis in the UI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternately, raise error without the IDs and use the Logger to store the IDs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going with adding names and QiitaDBError so the user can see the error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I just thought of if showing names: this will potentially show names of analyses that the user does not have access to. Probably not the end of the world, but could be a privacy issue.

Copy link
Member Author

@antgonza antgonza Apr 3, 2015 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And users should not be able to put it back to sandbox if it was public and
there was analysis using it.

2015-04-03 9:31 GMT-07:00 Antonio Gonzalez notifications@github.com:

In qiita_db/data.py
#1034 (comment):

  •        If the processed data status is not sandbox
    
  •    """
    
  •    if cls(processed_data_id).status != 'sandbox':
    
  •        raise QiitaDBStatusError(
    
  •            "Illegal operation on non sandbox processed data")
    
  •    conn_handler = SQLConnectionHandler()
    
  •    analysis_ids = [str(_id[0]) for _id in conn_handler.execute_fetchall(
    
  •        "SELECT DISTINCT analysis_id FROM qiita.analysis_sample WHERE "
    
  •        "processed_data_id = {0} ORDER BY "
    
  •        "analysis_id".format(processed_data_id))]
    
  •    if analysis_ids:
    
  •        raise QiitaDBError(
    
  •            "Processed data %d is linked to (meta)analyses: %s" %
    

I think that's fine. The only way this could happen is if the owner made
it public, then move it to sandbox, and is trying to delete it. Which will
be a full other beast ...


Reply to this email directly or view it on GitHub
https://github.com/biocore/qiita/pull/1034/files#r27738714.

Jose Navas

(processed_data_id, ", ".join(analysis_ids)))

# delete
queue = "delete_processed_data_%d" % processed_data_id
conn_handler.create_queue(queue)

sql = ("DELETE FROM qiita.preprocessed_processed_data WHERE "
"processed_data_id = {0}".format(processed_data_id))
conn_handler.add_to_queue(queue, sql)

sql = ("DELETE FROM qiita.processed_filepath WHERE "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs to remove the filepath table entries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, see my comment below (about purge_filepaths)

Copy link
Member Author

@antgonza antgonza Apr 3, 2015 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antgonza sorry for the confusion, I was answering to @squirrelo comment about deleting the entry on qiita.filepath. I guess I should have been more explicit :-)

"processed_data_id = {0}".format(processed_data_id))
conn_handler.add_to_queue(queue, sql)

sql = ("DELETE FROM qiita.study_processed_data WHERE "
"processed_data_id = {0}".format(processed_data_id))
conn_handler.add_to_queue(queue, sql)

sql = ("DELETE FROM qiita.processed_data WHERE "
"processed_data_id = {0}".format(processed_data_id))
conn_handler.add_to_queue(queue, sql)

conn_handler.execute_queue(queue)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to delete the files on the filesystem, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that we have not discussed as a group, but I will just make a note here. If you take a look, it doesn't even remove the row from qiita.filepath table. We are planning to run "purge_filepaths" in a cron job. That function, identifies which rows from qiita.filepath are not referenced and removes them from the table and from the filesystem. The advantage of this, is that the server is more free to answer requests rather than doing maintenance work, that can be done by a worker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both comments: What we are currently doing is to leave the filepath entry (it doesn't affect anything) as deleting files could be pretty expensive. Additionally, the plan is to drop one of the already created functions once we have this in: #814

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, can an issue be opened specifically for that then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #814

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW @josenavas, we discussed this decision as a group, search in your email for "deleting files and automatic jobs"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question then is will that job be fired off as soon as the server starts or will there be a delay? Any delay means that the files could sit around indefinitely, as the user could start the server, quickly do something, and then shut it down again without that cleanup ever firing.

Also I do not have that email in any of my accounts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that @antgonza I found the email. We should write those small parts on the Contributing.md file. Next week, I'm going to put a new version together and I will submit a PR, so we can all review it.

The reason why I thought it was not discussed is because that is handled inconsistently across the code base, so I thought that was not agreed. Thanks for pointing that out!

@squirrelo I forwarded the email to you for your reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Will you mind moving the discussion about how the job is gonna work
    to the actual issue? That way we will ensure everybody's comments are
    taken into account when that's resolved. Thanks!
  2. Thread to which you replied:
    https://groups.google.com/forum/#!searchin/qiita-dev/deleting$20files$20and$20automatic$20jobs/qiita-dev/X8pu1fHASuY/TT2_fARyS50J

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will just conclude by saying if this pull request is leveraging that expected functionality, that functionality should probably go in pretty quickly after this request.

@property
def preprocessed_data(self):
r"""The preprocessed data id used to generate the processed data"""
Expand Down
15 changes: 15 additions & 0 deletions qiita_db/test/test_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -785,6 +785,21 @@ def test_create(self):
# preprocessed_data_id, processed_Data_id
self.assertEqual(obs, [[1, 2]])

def test_delete(self):
"""Correctly deletes a processed data and raises an error if it has
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, these descriptions have to be one-liners, otherwise they do not show up correctly (or at least they did not show up before...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing 1 test: the processed data cannot be removed because it is linked to a (meta)analysis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

been used in a (meta)analysis or if it's public"""
pd = ProcessedData.create(self.params_table, self.params_id,
self.filepaths,
preprocessed_data=self.preprocessed_data,
processed_date=self.date)
ProcessedData.delete(pd.id)

with self.assertRaises(QiitaDBUnknownIDError):
ProcessedData.delete(pd.id)

with self.assertRaises(QiitaDBStatusError):
ProcessedData.delete(1)

def test_create_no_date(self):
"""Correctly adds a processed data with no date on it"""
# All the other settings have been already tested on test_create
Expand Down
30 changes: 29 additions & 1 deletion qiita_pet/handlers/study_handlers/description_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -697,6 +697,33 @@ def delete_prep_template(self, study, user, callback):

callback((msg, msg_level, 'raw_data_tab', prep_id, None))

def delete_processed_data(self, study, user, callback):
"""Delete the selected processed data

Parameters
----------
study : Study
The current study object
user : User
The current user object
callback : function
The callback function to call with the results once the processing
is done
"""
pd_id = int(self.get_argument('processed_data_id'))

try:
ProcessedData.delete(pd_id)
msg = ("Processed data %d has been deleted" % pd_id)
msg_level = "success"
pd_id = None
except Exception as e:
msg = ("Couldn't remove processed data %d: %s" %
(pd_id, str(e)))
msg_level = "danger"

callback((msg, msg_level, 'processed_data_tab', pd_id, None))

@authenticated
def get(self, study_id):
study, user, full_access = self._get_study_and_check_access(study_id)
Expand Down Expand Up @@ -728,7 +755,8 @@ def post(self, study_id):
make_sandbox=self.make_sandbox,
update_investigation_type=self.update_investigation_type,
delete_raw_data=self.delete_raw_data,
delete_prep_template=self.delete_prep_template)
delete_prep_template=self.delete_prep_template,
delete_processed_data=self.delete_processed_data)

# Get the action that we need to perform
action = self.get_argument("action", None)
Expand Down
20 changes: 20 additions & 0 deletions qiita_pet/templates/study_description.html
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,26 @@
}
}

function delete_processed_data(processed_data_id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside the scope of this pull request, but we really should start using the AJAX POST capabilities instead of this form creation/submission we're currently doing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, can you add an issue so we do not forget?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #1036

Copy link
Member Author

@antgonza antgonza Apr 3, 2015 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (confirm('Are you sure you want to delete processed data ID: ' + processed_data_id + '?')) {
var form = $("<form>")
.attr("action", window.location.href)
.attr("method", "post")
.append($("<input>")
.attr("type", "hidden")
.attr("name", "processed_data_id")
.attr("value", processed_data_id))
.append($("<input>")
.attr("type", "hidden")
.attr("name", "action")
.attr("value", "delete_processed_data"));
$("body").append(form);
form.submit();
} else {
return false;
}
}

function make_public(pd_id) {
if (confirm("Are you sure you want to make this study public?")) {
var form = $("<form>")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,13 @@
<li class="active"><a href="#add_processed_data_tab" role="tab" data-toggle="tab">Add processed data</a></li>
<!-- Create the tabs for each processed data -->
{% for pd_id, pd, (class_icon1, class_icon2, color) in available_processed_data %}
<li><a href="#processed_data_info_{{pd_id}}" role="tab" data-toggle="tab">ID: {{pd_id}}&nbsp;
<div class="{{class_icon1}}" style="color: {{color}};"></div>
<div class="{{class_icon2}}" style="color: {{color}};"></div>
</a></li>
<li>
<a href="#processed_data_info_{{pd_id}}" role="tab" data-toggle="tab">ID: {{pd_id}}&nbsp;
<div class="{{class_icon1}}" style="color: {{color}};"></div>
<div class="{{class_icon2}}" style="color: {{color}};"></div>
<button class="close" title="Remove this processed data" type="button" onclick="delete_processed_data({{pd_id}})">&nbsp; ×</button>
</a>
</li>
{% end %}
</ul>
<!-- Create the tab panes -->
Expand Down