Inconsistent entity parsing using GPT-3.5 Turbo 16k

I'm utilising spacy-llm with GPT-3.5 Turbo 16k for NER (spacy.NER.v2). While the pipeline usually works as expected, identifying entities in doc.ents, there are instances where doc.ents returns empty, even though entities are present in the model's output (I've set save_io = true). This seems to occur when entities in the raw output are separated by hyphens instead of commas.

**Examples of Issue:**

**Incorrectly Parsed Output:**
DOC ENTS: ()
Component: llm
Response: Medical Condition:
 -  cytotoxicity
 -  mutagenicity
 -  chromosome damage

**Correctly Parsed Output**:
DOC ENTS: (coughs, eczema, coughs, coughs, coughs, diabetes)
[ ('coughs', 'Medical Condition'),  ('eczema', 'Medical Condition'), ('coughs', 'Medical Condition'),   ('coughs', 'Medical Condition'),  ('coughs', 'Medical Condition'), ('diabetes', 'Medical Condition')]
Component: llm
Response: Medical Condition: coughs, eczema, diabetes

The parser does not seem to handle model outputs formatted with entities listed under a category and separated by hyphens.

Where can I find and modify the parser in the spacy-llm pipeline to account for variations in entity formatting in the model's output? Specifically, how can it be adjusted to parse entities separated by hyphens as well as those separated by commas? Do you have any sueggestions?





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Inconsistent entity parsing using GPT-3.5 Turbo 16k #464

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Inconsistent entity parsing using GPT-3.5 Turbo 16k #464

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions