Description
Hi, I'm the author of the R package 'icd', and I'm glad to see that several of us have worked on solving the comorbidity computation problem. Just noticed your package today. Also, glad to see you live in my home country!
cc @patrickmdnet who is the author of 'medicalrisk'
I took some time this morning to compare the comorbidity computations between our packages both in speed and content. I was distressed to see we all differed from each other, particularly in the COPD/chronic lung disease and cancer/tumor categories. I dug into your source code and noticed you grep for descendents of a non-existent top-level code (498), giving a false positive for chronic lung disease with a random test code 498.82 . It is an open question what we should all be doing if potentially valid, but utterly non-existent codes appear, particularly as different annual revisions may gain or lose codes, and we would probably want to sweep them all up when looking for comorbidities.
In 'icd' I took the view that I would count non-existent descendents of extant codes, but I would exclude codes which had no parent with any association with a comorbidity.
I didn't look into the cancer side, but there were many more discrepancies which I suspect to be of the same origin.
Can I suggest randomly generating strings for testing. You can see I do this in the icd source code. I see you generated test data by sampling only valid codes.
One way for us all to work together might be for you to continue to implement comorbidities how you wish, and consider importing the 'icd' package for validation and explanation of actual codes, which I've put a lot of time into. I'm open to considering other ways for us to collaborate.
Best wishes,
Jack