-
Notifications
You must be signed in to change notification settings - Fork 1
Fix issues with references to spans #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issues with references to spans #8
Conversation
6d6b74c to
096d9c5
Compare
096d9c5 to
f51e8c6
Compare
Helps with debugging
f51e8c6 to
99398ca
Compare
c942446 to
42f270c
Compare
|
|
||
| spans = [] | ||
| spans.append(self.create_span(tokens, span["token_start"], start_label)) | ||
| spans.append(self.create_span(tokens, span["token_end"], end_label)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah so was this line the cause of the bug in the case where span_size = 0 then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep precisely! It was a case that had never come up before because references were always longer than one token.
lizgzil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 make test was successful.
This looks like a long PR, but it is mostly test data that has been added to
tests/prodigy/test_datato provide more realistic examples when testing conversion of reference annotations to token annotations. This PR:prodigy_to_tsvandrefs_to_token_annotationscommandssplit_long_spanmethod in theTokenTaggerclass which was causing some token level spans to be repeated when the reference span was of length = 1.