Skip to content

KeyError when serializing a doc object after adding a new entity label #514

@emsrc

Description

@emsrc

I'm trying to add new entity labels and add new entity spans accordingly. However, this results in a KeyError when using doc.to_bytes(). Minimal code example below:

# python3 + spacy 0.101.0

import spacy

nlp = spacy.load('en')

doc = nlp('This is a sentence about pasta.')

label = 'Food'
nlp.entity.add_label(label)
label_id = nlp.vocab.strings[label]

print(label_id)

doc.ents = [(label_id, 5,6)]

print(doc.ents)

byte_string = doc.to_bytes()

Output:

6832
(pasta,)
Traceback (most recent call last):
  File "/Users/work/Projects/ScienceIE/scienceie17/exps/crf0/minimal.py", line 18, in <module>
    byte_string = doc.to_bytes()
  File "spacy/tokens/doc.pyx", line 418, in spacy.tokens.doc.Doc.to_bytes (spacy/tokens/doc.cpp:10687)
  File "spacy/serialize/packer.pyx", line 110, in spacy.serialize.packer.Packer.pack (spacy/serialize/packer.cpp:5687)
  File "spacy/serialize/huffman.pyx", line 61, in spacy.serialize.huffman.HuffmanCodec.encode (spacy/serialize/huffman.cpp:2535)
KeyError: 6832

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugBugs and behaviour differing from documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions