Ability to obtain word count / word frequency from pretrained word vector corpus #5232
Unanswered
aced125
asked this question in
Help: Coding & Implementations
Replies: 1 comment
-
Some of the provided spacy md/lg models have word probabilities from a separate source than the vectors (German, Spanish, English, Greek), which you can access per token as |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Feature description
The idea would be to obtain word frequencies for e.g glove vectors.
This could allow computing weighted sentence vectors:
For example, SIF embeddings (https://openreview.net/pdf?id=SyK00v5xx)
There may be a way to do this already that I am not aware of.
Could the feature be a custom component or spaCy plugin?
I will provide a custom spacy component for SIF embeddings here:
Beta Was this translation helpful? Give feedback.
All reactions