-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
A simple and effective model for thinking about text documents in machine learning is called the Bag-of-Words Model, or BoW.
The model is simple in that it throws away all of the order information in the words and focuses on the occurrence of words in a document.
This can be done by assigning each word a unique number. Then any document we see can be encoded as a fixed-length vector with the length of the vocabulary of known words. The value in each position in the vector could be filled with a count or frequency of each word in the encoded document.
Metadata
Metadata
Assignees
Labels
No labels