Skip to content

Conversation

@antgonza
Copy link
Member

This fixes #1236 as discussed in there and during the last call. The best way to test this was to add a new biom artifact in a study that already had one => added in populate_test_db.sql (artifact 5). This change will also be helpful as we move to test more than one artifact at the time. However, this change also make some scripts to failed and I'm fixing them. I check all errors in my laptop and flake8 so we should be OK to review ASAP.

from biom.util import biom_open
import pandas as pd
from skbio.util import find_duplicates
from skbio.util import flatten
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flatten has been removed in later versions of skbio. In other parts of the code, we have been using the construction below to use flatten:

from itertools import chain
list(chain.from_iterable(ITERABLE))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K

@josenavas
Copy link
Contributor

thanks @antgonza - some comments

# different stages and possible errors.
samples = self.samples
# figuring out if we are going to have duplicated samples, again
# doing it here cause it's computetional cheaper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computetional -> computational

@mortonjt
Copy link
Contributor

Super minor typos. 👍 otherwise.

to_concat = []
for aid, samps in viewitems(samples):
qdb.sql_connection.TRN.add(sql, [aid])
qm_fp = qdb.sql_connection.TRN.execute_fetchindex()[0][1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized - if you are accession only to [0][1], doesn't it mean that the above query is retrieving more info than needed? I think it should only be SELECT filepath and then here you can use execute_fetchlast. I think it will make the code cleaner and also removes the magic numbers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

@josenavas
Copy link
Contributor

Few minor comments.

@josenavas
Copy link
Contributor

👍 given tests pass

josenavas added a commit that referenced this pull request Feb 17, 2016
@josenavas josenavas merged commit 6d4e8c9 into qiita-spots:master Feb 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ensure unique IDs for metaanalysis

4 participants