Skip to content

dm_concat mode & corpus with '\0' tokens gives error #684

Open
@gojomo

Description

See https://groups.google.com/d/msg/gensim/8r0GOGif56U/KJ4mmQo6KQAJ

The creation of the null-word ignores whether there were any in the corpus, so seems to be clobbering necessary info for the case where '\0' does need to be a predicted-word.

As this is a final step, it can probably notice that such a word already exists, and perhaps log a warning that the same token is playing two roles (whatever it is in the corpus, plus the special plug null_word value).

Metadata

Assignees

No one assigned

    Labels

    difficulty mediumMedium issue: required good gensim understanding & python skillsdocumentationCurrent issue related to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions