Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various tools for assembling directly from the graph, including using labels to track paths. #1412

Merged
merged 97 commits into from
Oct 31, 2016
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
5e41d57
Merge branch 'update/remove_labelptr' into feature/pathlink
ctb Jun 9, 2016
8d77afe
basic compilation works
ctb Jun 9, 2016
b1e99ed
track labels in collapsed linear paths
ctb Jun 9, 2016
a432f0e
cleanup & commenting
ctb Jun 9, 2016
b3a7c3f
add assemble_labeled_path
ctb Jun 9, 2016
ddedd03
compile, basic fns & API
ctb Jun 9, 2016
8c6d069
changed fn signature of assemble_labeled_right
ctb Jun 9, 2016
d58a139
seems to be ...working?
ctb Jun 9, 2016
a707942
made left + right assembly work
ctb Jun 9, 2016
3740422
a first attempt at streaming assembly
ctb Jun 9, 2016
dc517f8
Merge branch 'update/remove_labelptr' into feature/pathlink
ctb Jun 10, 2016
fcf61c0
basic stuff is working
ctb Jun 10, 2016
5e08729
fix print function py2 issue
ctb Jun 10, 2016
a3e6cfa
track visited & avoid infinite loops
ctb Jun 12, 2016
de261f6
add extract-unassembled-reads*py
ctb Jun 15, 2016
04168a6
Merge branch 'master' of github.com:dib-lab/khmer into feature/pathlink
ctb Jun 25, 2016
22393ea
Merge branch 'feature/assemble' into feature/pathlink
ctb Jun 26, 2016
7125a33
Merge branch 'update/remove_labelptr' into feature/pathlink
ctb Jun 26, 2016
6e9650e
Merge branch 'master' of github.com:dib-lab/khmer into feature/pathlink
ctb Jun 27, 2016
c63f31e
fix typo
ctb Jun 27, 2016
250a997
Merge branch 'master' of github.com:dib-lab/khmer into feature/pathlink
ctb Jun 27, 2016
e994c91
fixed bug with 'temporary' use of std:string
ctb Jul 5, 2016
8b3a22b
Merge branch 'fix/readpairiter_error' into feature/pathlink
ctb Jul 18, 2016
1a61e5c
Merge branch 'master' of https://github.com/dib-lab/khmer into featur…
ctb Jul 18, 2016
aa7e7b6
update extract-unassembled-reads* sandbox scripts for python3
ctb Jul 18, 2016
fd9d82c
fix @ljcohen short sequence issue?
ctb Jul 18, 2016
1ea39f1
Make build_kmer const method
camillescott Jul 25, 2016
da2408a
First pass assembler class
camillescott Jul 25, 2016
55350d1
Add an assemble_left function to match assemble_right
camillescott Jul 26, 2016
e33cef4
TEMPORARILY add github target for screed bugfix to pass tests
camillescott Jul 26, 2016
df3a859
TEMPORARILY add github target to all target for testing
camillescott Jul 26, 2016
d135d8d
Add TEMP screed install to install deps, roll back other stuff
camillescott Jul 26, 2016
ef41583
Fix pep8 for jenkins
camillescott Jul 26, 2016
80df67f
Add node filtering functions
camillescott Jul 26, 2016
df30190
Add more debugging output
camillescott Jul 27, 2016
bbc80fd
Turn on debugging
camillescott Jul 27, 2016
40e4766
Merge branch 'feature/review/pathlink' of github.com:ged-lab/khmer in…
camillescott Jul 27, 2016
f663b6c
Update tag and label getters to take a reference to Tag and Label sets
camillescott Aug 5, 2016
37b8fad
Create a global collection of symbol alphabets
camillescott Aug 5, 2016
357f8be
Add symbols.hh and cc to setup.py
camillescott Aug 5, 2016
5ee5eb9
Add tracking info assembly script; add specialized assembler traverser
camillescott Aug 5, 2016
eb8857d
Expose reverse complement function to Python land
camillescott Aug 9, 2016
0d288b3
Add import for reverse_complement
camillescott Aug 10, 2016
1bb912b
Add string representation for Kmer
camillescott Aug 10, 2016
5220f41
Modify AssemblerTraverser to only emit found bases; update Assembler …
camillescott Aug 10, 2016
7fb2006
Check for orientation of seed kmer
camillescott Aug 10, 2016
80cd04c
Remove some debugging output
camillescott Aug 10, 2016
7b2c5ff
Add a template parameter for the direction of the AssemblerTraverser.
camillescott Aug 11, 2016
a298b48
Fix direction parameters
camillescott Aug 11, 2016
cf55bd7
Fix syntax error from bad patch
camillescott Aug 11, 2016
e42c89c
Replace function redirection with template specialization
camillescott Aug 11, 2016
37596f9
First pass at LabeledLinearAssembler
camillescott Aug 11, 2016
5ee1b9a
Refactor labeled assembly into LabeledLinearAssembler, right assembly…
camillescott Aug 12, 2016
3ddeeec
Fill out direction templating for Labeled assembly, all tests passing
camillescott Aug 13, 2016
6f07427
explicitly add assembler functions to khmer namespace instead of a us…
camillescott Aug 15, 2016
8217e35
Reorganize AssemblerTraverser into traversal
camillescott Aug 15, 2016
d59cd3e
Try declaring template specs
camillescott Aug 16, 2016
93fcabc
Remove const qualifier on SeenSet * visited to allow call to insert
camillescott Aug 16, 2016
341f3d2
Add new headers to liboxli Makefile
camillescott Aug 16, 2016
5377440
Make Pep8 compliant, update _equals_rc to use new khmer rc method
camillescott Aug 16, 2016
1a81fcd
Remove -Wmaybe-uninitialized compiler warning at traversal.cc 225
camillescott Aug 16, 2016
3ab109c
Update ChangeLog
camillescott Sep 26, 2016
1c09279
Update from master into temp branch
camillescott Sep 26, 2016
ebe35a9
Merge master
camillescott Sep 26, 2016
44af1ea
Cleanup some commented code and add some documentation to LabeledLine…
camillescott Sep 26, 2016
a98a10b
Do autoformat
camillescott Sep 26, 2016
f256d49
Refactor Traverser into two new classes, NodeGatherer and NodeCursor,…
camillescott Sep 27, 2016
011cdd6
LabeledLinearAssembler renamed to NaiveLabeledAssembler; remove redud…
camillescott Sep 30, 2016
9834c59
Merge branch 'feature/assemble' into feature/review/pathlink
ctb Oct 1, 2016
d2788fd
Merge branch 'master' of github.com:dib-lab/khmer into feature/review…
ctb Oct 1, 2016
281bae1
Merge branch 'master' of github.com:dib-lab/khmer into feature/review…
ctb Oct 1, 2016
0e29e73
Change alphabets to string and iterate with auto for loop; rename sym…
camillescott Oct 3, 2016
dac2e26
Add an additional revcomp check and mutate to be sure that all contig…
camillescott Oct 3, 2016
3344316
Remove unused set_cursor; allow an offset for join_contigs
camillescott Oct 3, 2016
8a1ad98
Rename labeled assembler, enable proper branching along with simple t…
camillescott Oct 3, 2016
b7be942
Merge branch 'feature/assembly/tipkiller' into feature/review/pathlink
camillescott Oct 3, 2016
96787bf
Revert setup.py compile arg change
camillescott Oct 3, 2016
7b7ff69
Add updated doxygen comments and move a couple function to their base…
camillescott Oct 4, 2016
151e366
Remove sneaky alphabet from Makefile
camillescott Oct 4, 2016
f36129d
Move assembly tests to their own file and add some utilities (with te…
camillescott Oct 4, 2016
49b3d3a
Rename some tests, cluster with Classes, add clear comments with grap…
camillescott Oct 4, 2016
80a6959
remove interrupted test, as it is not necessary and covered by test_r…
camillescott Oct 4, 2016
3f41db3
Rewrite assembly tests with fixtures, parameterize for sequence lengt…
camillescott Oct 5, 2016
7a32a8a
Allow passing stop_bf to labeled assembler from Pythonland
camillescott Oct 6, 2016
8ed3587
Add graph structure fixtures for forks and bubbles, and convert tests…
camillescott Oct 6, 2016
2a7af83
Remove debug flags
camillescott Oct 6, 2016
0bc8e19
Merge master
camillescott Oct 6, 2016
1fb22be
Add encoding to test_assembly.py
camillescott Oct 6, 2016
8f37482
pep8 compliance
camillescott Oct 6, 2016
2fdf515
Fix some pep8 and a dumb error
camillescott Oct 6, 2016
ab47816
Convert SimpleLabeledAssembler to iterative impl
camillescott Oct 7, 2016
048a6d2
Add a test for tandem repeats
camillescott Oct 7, 2016
32929f6
Change signature of AssemblerTraverser::next_symbol to virtual to all…
camillescott Oct 7, 2016
aaa970f
Merge branch 'master' into feature/review/pathlink
ctb Oct 30, 2016
c5b8d52
Merge branch 'master' into feature/pathlink
betatim Oct 31, 2016
62dd982
Formatting only changes via `make format`
betatim Oct 31, 2016
361511a
fix pep8
ctb Oct 31, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Formatting only changes via make format
  • Loading branch information
betatim committed Oct 31, 2016
commit 62dd98263034470fabf2c911b2ec3445919d6afa
8 changes: 4 additions & 4 deletions lib/alphabets.cc
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,10 @@ namespace khmer
namespace alphabets
{

std::string DNA_SIMPLE = "ACGT";
std::string DNAN_SIMPLE = "ACGTN";
std::string IUPAC_NUCL = "ACGTURYSWKMBDHVN.-";
std::string IUPAC_AA = "ACDEFGHIKLMNPQRSTVWY";
std::string DNA_SIMPLE = "ACGT";
std::string DNAN_SIMPLE = "ACGTN";
std::string IUPAC_NUCL = "ACGTURYSWKMBDHVN.-";
std::string IUPAC_AA = "ACDEFGHIKLMNPQRSTVWY";

}
}
11 changes: 7 additions & 4 deletions lib/assembler.cc
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,8 @@ const
}

template<>
std::string LinearAssembler::_assemble_directed<RIGHT>(AssemblerTraverser<RIGHT>& cursor)
std::string LinearAssembler::_assemble_directed<RIGHT>
(AssemblerTraverser<RIGHT>& cursor)
const
{
std::string contig = cursor.cursor.get_string_rep(_ksize);
Expand Down Expand Up @@ -227,11 +228,12 @@ const
{
#if DEBUG_ASSEMBLY
std::cout << "## assemble_labeled_directed_" << direction << " [start] at " <<
start_cursor.cursor.repr(_ksize) << std::endl;
start_cursor.cursor.repr(_ksize) << std::endl;
#endif

// prime the traversal with the first linear segment
std::string root_contig = linear_asm->_assemble_directed<direction>(start_cursor);
std::string root_contig = linear_asm->_assemble_directed<direction>
(start_cursor);
#if DEBUG_ASSEMBLY
std::cout << "Primed: " << root_contig << std::endl;
std::cout << "Cursor: " << start_cursor.cursor.repr(_ksize) << std::endl;
Expand Down Expand Up @@ -295,7 +297,8 @@ const
branch_starts.pop();

#if DEBUG_ASSEMBLY
std::cout << "Branch cursor: " << branch_cursor.cursor.repr(_ksize) << std::endl;
std::cout << "Branch cursor: " << branch_cursor.cursor.repr(
_ksize) << std::endl;
#endif

// assemble linearly as far as possible
Expand Down
6 changes: 4 additions & 2 deletions lib/assembler.hh
Original file line number Diff line number Diff line change
Expand Up @@ -98,10 +98,12 @@ public:
// The explicit specializations need to be declared in the same translation unit
// as their unspecialized declaration.
template<>
std::string LinearAssembler::_assemble_directed<LEFT>(AssemblerTraverser<LEFT> &cursor) const;
std::string LinearAssembler::_assemble_directed<LEFT>(AssemblerTraverser<LEFT>
&cursor) const;

template<>
std::string LinearAssembler::_assemble_directed<RIGHT>(AssemblerTraverser<RIGHT> &cursor) const;
std::string LinearAssembler::_assemble_directed<RIGHT>(AssemblerTraverser<RIGHT>
&cursor) const;


/**
Expand Down
28 changes: 14 additions & 14 deletions lib/kmer_filters.cc
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,8 @@ KmerFilter get_label_filter(const Label label, const LabelHash * lh)


KmerFilter get_simple_label_intersect_filter(const LabelSet& src_labels,
const LabelHash * lh,
const unsigned int min_cov)
const LabelHash * lh,
const unsigned int min_cov)
{
auto src_begin = src_labels.begin();
auto src_end = src_labels.end();
Expand All @@ -92,19 +92,19 @@ KmerFilter get_simple_label_intersect_filter(const LabelSet& src_labels,
KmerFilter filter = [=] (const Kmer& node) {
LabelSet dst_labels;
lh->get_tag_labels(node, dst_labels);

LabelSet intersect;
std::set_intersection(src_begin, src_end,
dst_labels.begin(), dst_labels.end(),
std::inserter(intersect, intersect.begin()));
dst_labels.begin(), dst_labels.end(),
std::inserter(intersect, intersect.begin()));

if ((intersect.size() == 1)
&& (dst_labels.size() == 1)
&& (src_size >= min_cov)) {
&& (dst_labels.size() == 1)
&& (src_size >= min_cov)) {
#if DEBUG_FILTERS
std::cout << "TIP: " << intersect.size() << ", " <<
dst_labels.size() << ", " << src_size << std::endl;
#endif
dst_labels.size() << ", " << src_size << std::endl;
#endif
// putative error / tip
return true;
} else if (intersect.size() > 0) {
Expand All @@ -117,7 +117,7 @@ KmerFilter get_simple_label_intersect_filter(const LabelSet& src_labels,

return filter;
}


KmerFilter get_stop_bf_filter(const Hashtable * stop_bf)
{
Expand All @@ -132,12 +132,12 @@ KmerFilter get_visited_filter(const SeenSet * visited)
{
#if DEBUG_FILTERS
std::cout << "Create new visited filter with " << visited <<
" containing " << visited->size() << " nodes" << std::endl;
" containing " << visited->size() << " nodes" << std::endl;
#endif
KmerFilter filter = [=] (const Kmer& node) {
#if DEBUG_FILTERS
std::cout << "Check visited filter (" << visited->size()
<< " elems)" << std::endl;
std::cout << "Check visited filter (" << visited->size()
<< " elems)" << std::endl;
#endif
return set_contains(*visited, node);
};
Expand Down
6 changes: 3 additions & 3 deletions lib/kmer_filters.hh
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,9 @@ bool apply_kmer_filters(const Kmer& node, const KmerFilterList& filters);

KmerFilter get_label_filter(const Label label, const LabelHash * lh);

KmerFilter get_simple_label_intersect_filter(const LabelSet& src_labels,
const LabelHash * lh,
const unsigned int min_cov = 5);
KmerFilter get_simple_label_intersect_filter(const LabelSet& src_labels,
const LabelHash * lh,
const unsigned int min_cov = 5);

KmerFilter get_stop_bf_filter(const Hashtable * stop_bf);

Expand Down
4 changes: 2 additions & 2 deletions lib/kmer_hash.hh
Original file line number Diff line number Diff line change
Expand Up @@ -198,8 +198,8 @@ public:
{
std::string s = "<Us=" + _revhash(kmer_u, K) + ", Fs=" +
_revhash(kmer_f, K) + ", Rs=" + _revhash(kmer_r, K) + ">";
//", U=" + std::to_string(kmer_u) + ", F=" + std::to_string(kmer_f) +
//", R=" + std::to_string(kmer_r) + ">";
//", U=" + std::to_string(kmer_u) + ", F=" + std::to_string(kmer_f) +
//", R=" + std::to_string(kmer_r) + ">";
return s;
}

Expand Down
34 changes: 17 additions & 17 deletions lib/traversal.cc
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ namespace khmer

template <bool direction>
NodeGatherer<direction>::NodeGatherer(const Hashtable * ht,
KmerFilterList filters) :
KmerFilterList filters) :
KmerFactory(ht->ksize()), graph(ht), filters(filters)
{
bitmask = 0;
Expand All @@ -63,15 +63,15 @@ NodeGatherer<direction>::NodeGatherer(const Hashtable * ht,


template <bool direction>
NodeGatherer<direction>::NodeGatherer(const Hashtable * ht) :
NodeGatherer<direction>::NodeGatherer(const Hashtable * ht) :
NodeGatherer(ht, KmerFilterList())
{
}


template <bool direction>
NodeGatherer<direction>::NodeGatherer(const Hashtable * ht,
KmerFilter filter) :
NodeGatherer<direction>::NodeGatherer(const Hashtable * ht,
KmerFilter filter) :
NodeGatherer(ht, KmerFilterList())
{
filters.push_back(filter);
Expand Down Expand Up @@ -107,7 +107,7 @@ const

template<bool direction>
unsigned int NodeGatherer<direction>::neighbors(const Kmer& node,
KmerQueue & node_q)
KmerQueue & node_q)
const
{
unsigned int found = 0;
Expand Down Expand Up @@ -149,8 +149,8 @@ const

template<bool direction>
NodeCursor<direction>::NodeCursor(const Hashtable * ht,
Kmer start_kmer,
KmerFilterList filters) :
Kmer start_kmer,
KmerFilterList filters) :
NodeGatherer<direction>(ht, filters)
{
cursor = start_kmer;
Expand All @@ -159,7 +159,7 @@ NodeCursor<direction>::NodeCursor(const Hashtable * ht,

template<bool direction>
NodeCursor<direction>::NodeCursor(const Hashtable * ht,
Kmer start_kmer) :
Kmer start_kmer) :
NodeCursor<direction>(ht, start_kmer, KmerFilterList())
{
}
Expand Down Expand Up @@ -190,17 +190,17 @@ const

Traverser::Traverser(const Hashtable * ht,
KmerFilterList filters) :
KmerFactory(ht->ksize()),
graph(ht),
KmerFactory(ht->ksize()),
graph(ht),
left_gatherer(ht, filters),
right_gatherer(ht, filters)
{
}

Traverser::Traverser(const Hashtable * ht,
KmerFilter filter) :
KmerFactory(ht->ksize()),
graph(ht),
Traverser::Traverser(const Hashtable * ht,
KmerFilter filter) :
KmerFactory(ht->ksize()),
graph(ht),
left_gatherer(ht, filter),
right_gatherer(ht, filter)
{
Expand All @@ -217,20 +217,20 @@ void Traverser::push_filter(KmerFilter filter)
unsigned int Traverser::traverse(const Kmer& node,
KmerQueue& node_q) const
{
return left_gatherer.neighbors(node, node_q) +
return left_gatherer.neighbors(node, node_q) +
right_gatherer.neighbors(node, node_q);
}


unsigned int Traverser::traverse_left(const Kmer& node,
KmerQueue& node_q) const
KmerQueue& node_q) const
{
return left_gatherer.neighbors(node, node_q);
}


unsigned int Traverser::traverse_right(const Kmer& node,
KmerQueue& node_q) const
KmerQueue& node_q) const
{
return right_gatherer.neighbors(node, node_q);
}
Expand Down
21 changes: 11 additions & 10 deletions lib/traversal.hh
Original file line number Diff line number Diff line change
Expand Up @@ -81,12 +81,12 @@ protected:
public:

explicit NodeGatherer(const Hashtable * ht,
KmerFilterList filters);
KmerFilterList filters);

explicit NodeGatherer(const Hashtable * ht);

explicit NodeGatherer(const Hashtable * ht, KmerFilter filter);

/**
* @brief Push a new filter on to the filter stack.
*/
Expand Down Expand Up @@ -166,11 +166,11 @@ public:
explicit NodeCursor(const Hashtable * ht,
Kmer start_kmer,
KmerFilterList filters);

explicit NodeCursor(const Hashtable * ht,
Kmer start_kmer);

explicit NodeCursor(const Hashtable * ht,
explicit NodeCursor(const Hashtable * ht,
Kmer start_kmer,
KmerFilter filter);

Expand All @@ -182,7 +182,8 @@ public:
*
* @return Number of neighbors found.
*/
unsigned int neighbors(KmerQueue& node_q) const {
unsigned int neighbors(KmerQueue& node_q) const
{
return NodeGatherer<direction>::neighbors(cursor, node_q);
}

Expand All @@ -201,7 +202,7 @@ class Traverser: public KmerFactory
{

protected:

const Hashtable * graph;
NodeGatherer<LEFT> left_gatherer;
NodeGatherer<RIGHT> right_gatherer;
Expand All @@ -213,7 +214,7 @@ public:

explicit Traverser(const Hashtable * ht) : Traverser(ht, KmerFilterList()) {}

explicit Traverser(const Hashtable * ht,
explicit Traverser(const Hashtable * ht,
KmerFilter filter);

void push_filter(KmerFilter filter);
Expand Down Expand Up @@ -268,7 +269,7 @@ public:
*
* @return The joined contig.
*/
std::string join_contigs(std::string& contig_a,
std::string join_contigs(std::string& contig_a,
std::string& contig_b,
WordLength offset = 0) const;
};
Expand Down
1 change: 1 addition & 0 deletions tests/khmer_tst_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@
def _equals_rc(query, match):
return (query == match) or (revcomp(query) == match)


def _contains_rc(match, query):
return (query in match) or (revcomp(query) in match)

Expand Down
Loading