-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alphabet] Alphabet types can no longer be constructed using their rank #2745
Comments
I think this is to prevent ambiguity as described here under "Assigning and retrieving values". For a one line initialization you could do this: using namespace seqan3::literals;
seqan3::phred42 quality_value{'A'_phred42}; |
Thanks for your answer. I am aware of the one line using the char representation, and I use it for nucleotide alphabets (where it makes sense, and is easily understandable). I currently need this in my tests to init qualified types, e.g. I what I did before was something like: seqan3::qualified<dna5, phred42> nucleotide1{'A'_dna5, phred42{37}};
seqan3::qualified<dna5, phred42> nucleotide2{'A'_dna5, phred42{37}};
seqan3::qualified<dna5, phred42> nucleotide3{'A'_dna5, phred42{37}}; This is also why I need a one-liner. Having multiple lines just to init these qualified values is not appropriate. What I would have with your suggestion would be: seqan3::qualified<dna5, phred42> nucleotide1{'A'_dna5, '%'_phred42};
seqan3::qualified<dna5, phred42> nucleotide2{'A'_dna5, 'C'_phred42};
seqan3::qualified<dna5, phred42> nucleotide3{'A'_dna5, '!'_phred42}; which is not really intuitive to read (could you tell me what quality '%' stands for?). Of course I can always comment, but you know what I mean ... For now, I am actually going with seqan3::qualified<dna5, phred42> nucleotide1{'A'_dna5, seqan3::phred42{}.assign_phred(37)}; which works and would even easily allow to use template types (which the operators don't), something like: template<typename dna_t, typename qual_t>
void whatever_function()
{
seqan3::qualified<dna_t, qual_t> nucleotide1{dna_t{}.assign_char('A'), qual_t{}.assign_phred(37)};
} I just don't find this solution very nice and thought there might be a nicer solution for this. |
Hi @tloka, thank you for the input :) One of our initial design goals for alphabets were to disallow initialising an alphabet by builtin types. For some reasons we did introduce this exception for phred-scores. Another problem is that seqan3/include/seqan3/alphabet/quality/phred_base.hpp Lines 65 to 68 in c7d7f1a
this constructor is implicit. That means
This shows that the API isn't clear. The given value is actually the PHRED-score value and not the rank. We currently have three ways to assign a value to a phred alphabet, That means seqan3::phred42 p = 0; // is that rank or phred score?
seqan3::phred42 p = -5; // is that rank or phred score?
seqan3::phred42 p = 'I'; // is that rank or phred score? were all valid constructors. To disambiguate those, we disallow a direct construction and use explicit initialisation by their assignment functions. Furthermore, the most generic way would be to use using some_phred_type = seqan3::phred42;
seqan3::assign_phred_to(some_phred_type{}, 34); But, from what I read so far is that you don't really miss this particular constructor, but you want to have a short-way to initialise a known at compile time phred score value. I think introducing something like this: int main()
{
auto cutoff_phred = 14_phred42; // short for seqan3::phred42.assign_phred(14)
auto legacy = -3_phred68solexa;
auto by_char = 'I'_phred42;
} would solve that problem, too. (See https://godbolt.org/z/3aKz3s41q for an example :)) |
Hey @marehr,
I think this is solution is pretty much what I was looking for. I wasn't aware of this function, and the behavior that you can use it with an rvalue returning a copy. So, if I understand it correctly, it would work to use the following for qualified types: using some_phred_type = seqan3::phred42
seqan3::qualified<dna5, some_phred_type> nucleotide1{'A'_dna5, seqan3::assign_phred_to(some_phred_type{}, 37)}; which I already find somehow nicer than the For your last suggestion, wouldn't something like this auto cutoff_phred = 14_phred42; // short for seqan3::phred42.assign_phred(14) again produce the same ambiguity as before for the constructors (is it the rank or the phred score)? At the same time, these would not work when using a template type or type alias. It would probably be the solution with the shortest syntax, but very limited to use. What I would probably like would be some generic initialization function template(s) for alphabets that you can clearly define in an unambiguous way, like // Template function definition
template<typename T> // TODO: restrict for alphabets
constexpr T init_by_rank(int rank)
{
return T{}.assign_rank(rank);
}
// ... and others
// ... and then use it like this
using some_alphabet_type = seqan3::phred42;
some_alphabet_type some_quality = seqan3::init_by_rank<some_alphabet_type>(37); // Available for all alphabet types
some_alphabet_type some_quality2 = seqan3::init_by_phred<some_alphabet_type>(25); // Only available for phred types
some_alphabet_type some_quality3 = seqan3::init_by_char<some_alphabet_type>('A'); // Available for all alphabet types
some_alphabet_type some_quality4 = seqan3::init_by_rank(37); // Auto deduction should be easily possible
// I guess, for qualified alphabets it would even enable
seqan3::qualified<seqan3::dna5, some_alphabet_type>{'A'_dna5, seqan3::init_by_phred(37)}; // (hopefully) auto deducted .. not sure though
// Note: I didn't think about all details, and probably it's not working exactly like this, but I guess you understand roughly what I mean.. Not as short, but unambiguous, intuitive (imo), possible to use with template types and aliases, possible to implement as In every case, it would be helpful to have examples for |
True, we would need some documentation to explain that no alphabet provides literals for the rank representation and this is actually the PHRED-score value.
All literals only work for constant-expressions (e.g. values that are known at compile time). some_alphabet_type some_quality4 = seqan3::init_by_rank(37); // Auto deduction should be easily possible
// I guess, for qualified alphabets it would even enable
seqan3::qualified<seqan3::dna5, some_alphabet_type>{'A'_dna5, seqan3::init_by_phred(37)}; // (hopefully) auto deducted .. not sure though What should the return type of I think your suggestion would lead to a completely different API. The main problem is that it assumes a unified way to construct/create an alphabet. Right now we don't require any alphabet to be constructible in a certain way, this allows us for example to have proxy-alphabets that can only be constructed with a reference alphabet (i.e. no default constructor).
are designed in such a way that they modify the state, but don't create the state. I think this is a more "generic"-way of API design, your solution wouldn't allow for that. I understand the desire to have something equally short as We currently have this free-function design: // what's the difference between
seqan3::assign_rank_to(some_alphabet_type{}, 37); // note construction is done OUTSIDE the function, we assume how to construct it
// and
seqan3::init_by_rank<some_alphabet_type>(37); // init_by_rank needs to assume how to construct some_alphabet_type, we have no control over that
// ? I don't think a strong-type like seqan3::dna4{seqan3::alphabet_rank{42}}; // strong-type alphabet_rank would be much better. seqan3::dna4::by_rank(42); // strong-type alphabet_rank something like that wouldn't be universal. |
Where would you expect to find such an example? Some locations:
Independent of that, it seems like |
ok, I think you are right, and what I would be looking for is simply not possible. // what's the difference between
seqan3::assign_rank_to(some_alphabet_type{}, 37); // note construction is done OUTSIDE the function, we assume how to construct it
// and
seqan3::init_by_rank<some_alphabet_type>(37); // init_by_rank needs to assume how to construct some_alphabet_type, we have no control over that
// ? I think the remaining points are very subjective, so I don't see where I have a valid point left 🤪 However, I think my previous closing is still valid
Also in the cookbook, you just show construction using literals and lvalue construction with follow-up assignment: // Two objects of seqan3::dna4 alphabet constructed with a char literal.
seqan3::dna4 ade = 'A'_dna4;
seqan3::dna4 gua = 'G'_dna4;
// Two additional objects assigned explicitly from char or rank.
seqan3::dna4 cyt, thy;
cyt.assign_char('C');
thy.assign_rank(3); I think, you should add at least the following examples: // assign_rank and assign_char can also be used with rvalues. This is, for example, useful for one-line construction of a qualified type
seqan3::qualified<seqan3::dna4, seqan3::phred42>(seqan3::dna4{}.assign_char('C'), seqan3::phred42{}.assign_rank(10));
// The same can be achieve with free functions
seqan3::qualified<seqan3::dna4, seqan3::phred42>(seqan3::assign_char_to(seqan3::dna4{}, 'C'), seqan3::assign_rank_to(seqan3::phred42{}, 10);
// Importantly, this way of alphabet type construction also works with type aliases and template types
using dna_t = seqan3::dna4;
dna_t cyt2 = tna_t{}.assign_char('C'); |
@eseiler Maybe I am not using the documentation 100% as I should. I usually check the cookbook and the snippets directly (in the repo). For my exact example, in both places, I couldn't find how to construct a qualified type with specific values in a single line (EDIT: without using literals; I needed to use it with a type alias). I think, in both places, I even didn't find how to perform rvalue construction with specific values (examples only show the assignment to lvalues). But maybe I just missed something ... EDIT2: I think the main point is the use of type aliases and template types. You mainly focus on literals in the examples - but these don't work in these situations. At the same time, I wasn't aware that the free function can be used with rvalues returning a copy (however, I found that in the documentation later) .. ok ,maybe it's just my fault 🤪 |
I think we could do (either or both):
Depends on how much is "left over" after doing the snippets. I think we should check if we can link |
With thinking a bit about it and having one more look in the documentation, I think the documentation of the |
I have just glanced through all comments, and I'm not sure if you already suggested this. Maybe we should add a notice box to all member functions that are there to make class model a certain concept / customisation point object (CPO). Oh, I just saw that alphabet_base does that for seqan3::alphabet_base::to_char, seqa3::alphabet_base::to_rank, etc... But, adding a snippet there would be helpful, too. seqan3::phred_base::assign_phred just says:
I think no-one knows what wrapper mean in this case. I think it should just say:
PLUS a snippet showing "free" function (CPO) and member function usage.
Technically speaking this isn't a free function, but a functor (i.e. class with call operator) instance with some special overload rules which is called a CPO. We need to fix an alphabet for the member functions, too, as they all inherit from
Sounds reasonable. |
Yeah, just to sum it up and sort it a bit from my side:
The documentation is really good and extensive once you find the correct place. For me, I was just not able to find the right places (though I must admit that I also didn't look too intensively into the documentation this time; but the first places where I looked didn't point me to the place where I would find the information I needed - this is what could probably be improved by adding the examples suggested above). |
I am just working on an SeqAn3 upgrade to have the newest version running.
I recognized with the quality alphabets (in my case mostly phred42), that it is no longer possible to construct them using a rank value, e.g., what I could do before was:
Is it the desired behavior that this is no longer possible?
If yes, why was it removed? Especially for quality values, it is very useful to allow initialization by rank.
What would be your recommendation to initialize an alphabet element efficiently and with single-line code (I didn't find something helpful in the snippets)? Is there some useful function I didn't find to quickly convert a numeric value to an alphabet type intepreting the numeric value as its rank? Assigning the rank after construction feels "wrong" and is probably also not the most efficient solution...
The text was updated successfully, but these errors were encountered: