Skip to content

Allow reserved IDs for token2id in Dictionary #2190

Open
@Froskekongen

Description

The dictionary now stard indexing words at 0, which may be unwanted for several applications. E.g. in my application, I want the first word to have index 1, and reserve 0 for padding. Similarly, one may want to reserve certain IDs for unknown tokens, special delimiter tokens, etc, that will remain unaffected by, e.g. filter_extremes and compactify. What I propose, is to be able to supply reserved IDs in Dictionary that will remain unaffected by adding new documents and using different methods on Dictionary.

Metadata

Assignees

No one assigned

    Labels

    featureIssue described a new feature

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions