Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outliers in the Dengue Dataset #83

Closed
Rohit-Satyam opened this issue Oct 10, 2024 · 3 comments · Fixed by #84
Closed

Outliers in the Dengue Dataset #83

Rohit-Satyam opened this issue Oct 10, 2024 · 3 comments · Fixed by #84
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@Rohit-Satyam
Copy link
Contributor

Rohit-Satyam commented Oct 10, 2024

The following Accessions were found as the outlier sequences by Hill et al 2024 in her recent publication (The list below is the courtesy of Suraj Jagtap on the paper. These sequences were found to emanate from re sequencing of commonly used virus stocks.

@j23414 Do you guys plan to retain them or remove them. Kindly let me know (I see that there are some nextclade people on the paper).

MW828678.1
MT076932.1
MT076933.1
MT076934.1
MT076935.1
MK506262.1
KU094071.1
KM204119.1
EU848545.1
MZ284953.1
MZ285732.1
MF576311.1
MW945433.1
MT076937.1
MH613984.1
MH613985.1
MH613986.1
MK506264.1
MK506263.1
KU094070.1
KM204118.1
KF704358.1
KF704357.1
KF704354.1
KF704355.1
KJ918750.1
KF704356.1
HQ891023.1
HQ891024.1
GQ398268.1
FJ906959.1
FJ390389.1
EU854293.1
MZ285058.1
KY586699.1
OP809583.1
LT996912.1
MT076948.1
MK506265.1
JN697379.1
KM190936.1
MW793460.1
MT076955.1
KY586824.1
KY586825.1
MW793459.1
KY586823.1
KY586826.1
MK506266.1

@Rohit-Satyam Rohit-Satyam added the bug Something isn't working label Oct 10, 2024
@j23414
Copy link
Contributor

j23414 commented Oct 10, 2024

Hi @Rohit-Satyam! Would you feel comfortable submitting a PR to remove these outliers from the live build? Mostly it involves adding them to this exclude list

We usually leave off the version number (e.g. MK506266.1 -> MK506266) and add a comment on reasoning, for line example:

MK506266 # Outlier according to publication Hill et al, 2024 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11426435/)

@j23414 j23414 added the help wanted Extra attention is needed label Oct 10, 2024
@Rohit-Satyam
Copy link
Contributor Author

Okay @j23414 . I see 9 of them are already there. So I will add rest 40 of the accessions

Rohit-Satyam added a commit to Rohit-Satyam/dengue that referenced this issue Oct 12, 2024
Following outlier format reported in [issue 83](nextstrain#83 (comment))
j23414 pushed a commit that referenced this issue Oct 14, 2024
Excluded 40 genomes that were found as outliers due to re-sequencing commonly used viral stocks by Hill et al 2024

Following outlier format reported in [issue 83](#83 (comment))
@j23414 j23414 linked a pull request Oct 14, 2024 that will close this issue
1 task
@j23414
Copy link
Contributor

j23414 commented Oct 14, 2024

Thanks @Rohit-Satyam ! Closing this since #84 merged

@j23414 j23414 closed this as completed Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants