Skip to content

feat: Expose CharacterNgrams extractor for direct use in Python #46

@RichardOberdieck

Description

@RichardOberdieck

Is your feature request related to a problem? Please describe.
I would like to use a vector database (qdrant in this case) rather than the HashDb from this repo. To make the results reproducible, I therefore need to create the embedding from the CharacterNgrams extractor. Currently, this is not possible.

Describe the solution you'd like
Be able to call the CharacterNgrams extractor directly from Python, e.g.

from simstring_rust.extractors import CharacterNgrams

extractor = CharacterNgrams(2, endmarker='$')
embedding = extractor.apply("Some text")

Describe alternatives you've considered
N/A

Additional context
N/A

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions