You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the distribution of duplicates labels. There are a total of 187,485 entries. "same" means if all duplicates would be classified as binders/non-binders with a 500 nm cutoff.
From this you could conclude that 2% of the data is noisy. Though you could also say that this is an indication that 12% (2/18: different/same) of the data could have noise. In other words we are not able to say that all the non-duplicated entries are always correct because we have no duplicates to corroborate that.
Some entries in the MHCflurry database have different labels for identical peptide-MHC complexes.
From the MHCflurry database, the EAAGIGILTV peptide has different measurements for the same allele:
There are a lot of cases like this one.
The text was updated successfully, but these errors were encountered: