-
Notifications
You must be signed in to change notification settings - Fork 98
Allow QuickUMLS to be used as component in spacy pipeline #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…spacy pipeline. This can be used as an entity matcher for UMLS concepts in other modular spacy pipelines. This is already being used in an operational capacity for syndromic surveillance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @burgersmoke,
Thank you so much for this! It looks awesome; I've left a couple minor comments. Could you also add a section in README.md
with information about how to use this? personally, I'd move spacy_example_pipeline.py
a section at the bottom of the readme file, but I'll leave it up to you.
Thank you again, and looking forward to merging this changes in! This is a feature big enough to roll it out as v.1.5
Great. I appreciate all of this feedback since I know this was a rough PR.
Nevermind. I did not understand the previous comment, but understood it later. I've moved the example code to the README instead of the example .py file. Now that I have your feedback, I'll make these changes tonight or Tuesday and update the PR. |
Went through it quickly, seems nice for first iteration ! Thanks for work. |
…n QuickUMLS and SpacyQuickUMLS. Fixed keyword arguments to QuickUMLS when creating a spacy component. Added documentation as well. Removing previous standalone example Python file and instead adding an example of a QuickUMLS spacy pipleline to the README.
OK, I think this is ready to go now with |
Any update on this? I believe I've resolved the code issue and remaining questions. Are there any other questions about this change I can answer? I'm wanting to move this forward since we're planning to leverage this to make medspacy and QuickUMLS better with some mutually beneficial upcoming work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in reviewing this! The new changes look good to me.
Fantastic. Thank you @soldni ! |
@burgersmoke thank you for your hard work! will push out a proper release (v.1.5) momentarily. |
My pleasure. There's more to come as QuickUMLS is an important part of our plans for medspacy. I'll continue to follow up with you and anyone else interested. |
I was doing some tests and notice that, in the latest version of spaCy (2.3), I get an error if two entities overlap:
According to this GitHub issue, this seems to be the expected behavior since at least spaCy (2.1.3) I am planning to solve this by either (a) adding a custom extension type called for ent in doc._.quickumls:
print(ent.text, ent.label_, ent._.semtypes, ent._.similarity) In the latter, you'd use the following syntax: for ent in doc.ents:
for match in ent._.quickumls:
print(ent.text, match.cui, match.semtypes, match.similarity) I personally prefer the second one, as it makes more sense to me to just have an entity with multiple labels, but please do let me know which one you think would make more sense. |
You are absolutely correct and I debated raising it earlier, but I wanted to take steps towards the component being included first so that we could have this conversation. I've opened this as a new issue here. I'll add my initial thoughts here: |
This can be used as an entity matcher for UMLS concepts in other modular spacy pipelines. This is already being used in an operational capacity for syndromic surveillance in a healthcare system. More info on this can be provided upon request.
This is in response to this issue:
#40