You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I want to use refinery to label for information extraction, but cannot upload my existing labels, which sets me back in my project by a large margin.
Describe the solution you'd like
I want to tokenize my data in a notebook with the same tokenizer that refinery uses. I would then match the labels to the respective tokens. Technically, this would be realised through a JSON attribute, e.g. label__headline__MANUAL with the key of that being a list with one label per token, e.g. ["0", "0", "PERSON", "0"] (the "0" could also be null or anything other that is specified in the docs). This data, I want to upload to refinery. During the tokenization process, I want refinery to tell me if the internal tokenizer and my pre-tokenized data does not match. If so, there are two levels of complexity I can imagine:
simple: it should stop the tokenization process and throw an error that the tokenization did not match my pre-provided tokens (in length)
medium: it should additionally tell me what record caused this and what the tokenization lengths were (e.g. refinery produced 200 tokens while I only provided a list of 193 tokens)
Describe alternatives you've considered
hacking the project import/export functionality, which is rather complicated.
Additional context
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
I want to use refinery to label for information extraction, but cannot upload my existing labels, which sets me back in my project by a large margin.
Describe the solution you'd like
I want to tokenize my data in a notebook with the same tokenizer that refinery uses. I would then match the labels to the respective tokens. Technically, this would be realised through a JSON attribute, e.g.
label__headline__MANUAL
with the key of that being a list with one label per token, e.g.["0", "0", "PERSON", "0"]
(the "0" could also benull
or anything other that is specified in the docs). This data, I want to upload to refinery. During the tokenization process, I want refinery to tell me if the internal tokenizer and my pre-tokenized data does not match. If so, there are two levels of complexity I can imagine:Describe alternatives you've considered
hacking the project import/export functionality, which is rather complicated.
Additional context
The text was updated successfully, but these errors were encountered: