Include Custom Attributes in Doc.to_array() #5382
-
Are custom attributes (accessible by "_" keyspace) able to be included in the
results in a This question was asked in #2072 but was not explicitly answered. I looks like the answer is no, that the user should append entries to the numpy around after calling Which page or section is this issue related to? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
No, not easily, because it's possible to have custom attribute values that can't be represented as integers. I think you might run into problems with If you want to store large numbers of docs at once, have you looked at using |
Beta Was this translation helpful? Give feedback.
No, not easily, because it's possible to have custom attribute values that can't be represented as integers.
I think you might run into problems with
.from_array()
if you have appended values in your array (I don't think it has an option to ignore particular columns when loading), but if you manage the details for reloading the docs yourself, then it's certainly an option.If you want to store large numbers of docs at once, have you looked at using
DocBin
with thestore_user_data
option? https://spacy.io/usage/saving-loading#docs The msgpack serialization of custom attributes should be pretty efficient for built-in python types.