Skip to content

--replace-unk does not work as intended #1033

Closed
@bmtm

Description

ran across an issue in fairseq-interactive where unk tokens were not being replaced if there are unks in the source string, even though the --replace-unk flag is set.

example:

| Type the input sentence and press return:
Jack and Jill went up the hill
S-0	Jack and <unk> went up the hill
H-0	-0.9424245357513428	Jack and <unk> went up the hill
P-0	-0.1024 -1.3528 -0.1208 -1.4977 -1.0983 -1.7025 -0.4995 -1.1654
A-0	0 2 2 3 4 6 6 7
H-0	-0.9424245357513428	Jack and <unk> went up the hill
P-0	-0.1024 -1.3528 -0.1208 -1.4977 -1.0983 -1.7025 -0.4995 -1.1654
A-0	0 2 2 3 4 6 6 7

Looking at the code, I think the issue is here: https://github.com/pytorch/fairseq/blob/master/interactive.py#L157

src_str is re-created from src_tokens, which means it contains the unk token. When later we try to replace the unk in post_process_prediction(), it just replaces the unk with another unk

this seems like a bug, but I could be doing something wrong. I've fixed it locally just by keeping the original src_str and passing it to post_process_prediction()

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions