Skip to content

Conversation

@ivyleavedtoadflax
Copy link
Contributor

@ivyleavedtoadflax ivyleavedtoadflax commented Feb 25, 2020

This looks like a long PR, but it is mostly test data that has been added to tests/prodigy/test_data to provide more realistic examples when testing conversion of reference annotations to token annotations. This PR:

  • Adds new test data and tests
  • Adds new CLI arguments to the prodigy_to_tsv and refs_to_token_annotations commands
  • Fixes issue with the split_long_span method in the TokenTagger class which was causing some token level spans to be repeated when the reference span was of length = 1.
  • Formats tests using black.

@ivyleavedtoadflax ivyleavedtoadflax force-pushed the feature/ivyleavedtoadflax/prodigy_to_tsv branch from 6d6b74c to 096d9c5 Compare February 25, 2020 21:11
@ivyleavedtoadflax ivyleavedtoadflax force-pushed the feature/ivyleavedtoadflax/prodigy_to_tsv branch from 096d9c5 to f51e8c6 Compare February 25, 2020 22:18
@ivyleavedtoadflax ivyleavedtoadflax force-pushed the feature/ivyleavedtoadflax/prodigy_to_tsv branch from f51e8c6 to 99398ca Compare February 25, 2020 22:37
@ivyleavedtoadflax ivyleavedtoadflax force-pushed the feature/ivyleavedtoadflax/prodigy_to_tsv branch from c942446 to 42f270c Compare February 26, 2020 21:48
@ivyleavedtoadflax ivyleavedtoadflax changed the title new: Accept new arguments in prodigy_to_tsv command Fix issues with references to spans Feb 26, 2020
@ivyleavedtoadflax ivyleavedtoadflax marked this pull request as ready for review February 26, 2020 22:03
@ivyleavedtoadflax ivyleavedtoadflax changed the title Fix issues with references to spans Fix issues with references to spans (Fixes https://github.com/wellcometrust/datalabs/issues/605) Feb 26, 2020
@ivyleavedtoadflax ivyleavedtoadflax changed the title Fix issues with references to spans (Fixes https://github.com/wellcometrust/datalabs/issues/605) Fix issues with references to spans Feb 26, 2020

spans = []
spans.append(self.create_span(tokens, span["token_start"], start_label))
spans.append(self.create_span(tokens, span["token_end"], end_label))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah so was this line the cause of the bug in the case where span_size = 0 then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep precisely! It was a case that had never come up before because references were always longer than one token.

Copy link
Contributor

@lizgzil lizgzil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 make test was successful.

@ivyleavedtoadflax ivyleavedtoadflax merged commit dcea5e4 into master Feb 27, 2020
@ivyleavedtoadflax ivyleavedtoadflax deleted the feature/ivyleavedtoadflax/prodigy_to_tsv branch February 27, 2020 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants