Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent entity parsing using GPT-3.5 Turbo 16k #464

Open
francivita opened this issue Apr 26, 2024 · 0 comments
Open

Inconsistent entity parsing using GPT-3.5 Turbo 16k #464

francivita opened this issue Apr 26, 2024 · 0 comments

Comments

@francivita
Copy link

francivita commented Apr 26, 2024

I'm utilising spacy-llm with GPT-3.5 Turbo 16k for NER (spacy.NER.v2). While the pipeline usually works as expected, identifying entities in doc.ents, there are instances where doc.ents returns empty, even though entities are present in the model's output (I've set save_io = true). This seems to occur when entities in the raw output are separated by hyphens instead of commas.

Examples of Issue:

Incorrectly Parsed Output:
DOC ENTS: ()
Component: llm
Response: Medical Condition:

  • cytotoxicity
  • mutagenicity
  • chromosome damage

Correctly Parsed Output:
DOC ENTS: (coughs, eczema, coughs, coughs, coughs, diabetes)
[ ('coughs', 'Medical Condition'), ('eczema', 'Medical Condition'), ('coughs', 'Medical Condition'), ('coughs', 'Medical Condition'), ('coughs', 'Medical Condition'), ('diabetes', 'Medical Condition')]
Component: llm
Response: Medical Condition: coughs, eczema, diabetes

The parser does not seem to handle model outputs formatted with entities listed under a category and separated by hyphens.

Where can I find and modify the parser in the spacy-llm pipeline to account for variations in entity formatting in the model's output? Specifically, how can it be adjusted to parse entities separated by hyphens as well as those separated by commas? Do you have any sueggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant