Sentence categorization doesn't work using CLI train, json_to_tuple issue #5299
Replies: 2 comments
-
Sorry, the docs are out-of-date here. Supporting both versions of cats was the initial plan, but the The idea of supporting subdocument cats could make sense for training with the current implementation, but an evaluation that uses this format gets very complicated: do you evaluate every possible span in an eval document? What kind of summary score would you provide for a whole document after evaluating every single span? Your task is reasonable, but unfortunately not a perfect match for either the current See #3961 for related discussion. Suggestions/help are very welcome! |
Beta Was this translation helpful? Give feedback.
-
Thank you for the reply and the link to the related discussion. I'm new to spaCy, ML, NLP, etc and rather looking around and trying different things to figure out how it works and whether it is possible to solve the problem I have using spaCy and NLP. It is very possible that the approach I've chosen is not right. What I'd like to achieve is to find ingredient sentences in the recipe and then in each sentence/span to find ingredient name, amount and unit of measurement. For ingredient name, amount and unit of measurement, I want to use NER to identify ingredient name, amount and unit of measurement and for ingredient sentences/lines I wanted to use sentence categorization. If categorization doesn't work for sentences yet it seems like I can create two models. First will use NER to label ingredient sentences/spans and second will be used to identify ingredient name, amount and unit of measurement in these sentences. Please kindly advise if I'm on the right track or I have to choose another approach. Or maybe I've to use totally different tools for that :) |
Beta Was this translation helpful? Give feedback.
-
How to reproduce the behaviour
I'm trying to check how sentence categorization works. Unfortunately, I experience an issue when trying to train my model using CLI train command:
According to the GoldParse documentation, the cats attribute is
In my json I have cats specified like this:
I checked how json_to_tuple works and found out that it doesn't handle tuples as a category label here:
spaCy/spacy/gold.pyx
Line 515 in 6a8a526
Not sure whether it is enough to check if
cat['label']
is a list and convert it into tuple on this line.Your Environment
Info about spaCy
Beta Was this translation helpful? Give feedback.
All reactions