Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different labels for same peptide-MHC complex #37

Open
DanLep97 opened this issue Jul 27, 2022 · 1 comment
Open

Different labels for same peptide-MHC complex #37

DanLep97 opened this issue Jul 27, 2022 · 1 comment

Comments

@DanLep97
Copy link
Collaborator

Some entries in the MHCflurry database have different labels for identical peptide-MHC complexes.

From the MHCflurry database, the EAAGIGILTV peptide has different measurements for the same allele:

HLA-A*02:01,EAAGIGILTV,2272.0,=,quantitative,affinity,Rosenberg - purified MHC/competitive/radioactivity
HLA-A*02:01,EAAGIGILTV,14560.0,=,quantitative,affinity,Ovaa - purified MHC/competitive/fluorescence
HLA-A*02:01,EAAGIGILTV,500.0,<,qualitative,affinity,Sewell - cellular MHC/direct/fluorescence
HLA-A*02:01,EAAGIGILTV,5000.0,<,qualitative,affinity,Sewell - cellular MHC/direct/fluorescence

There are a lot of cases like this one.

@heleensev
Copy link
Collaborator

ba_duplicates_pie

This is the distribution of duplicates labels. There are a total of 187,485 entries. "same" means if all duplicates would be classified as binders/non-binders with a 500 nm cutoff.
From this you could conclude that 2% of the data is noisy. Though you could also say that this is an indication that 12% (2/18: different/same) of the data could have noise. In other words we are not able to say that all the non-duplicated entries are always correct because we have no duplicates to corroborate that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants