- 
                Notifications
    
You must be signed in to change notification settings  - Fork 0
 
Text Feature Extraction
        Sagen Soren edited this page Nov 30, 2020 
        ·
        2 revisions
      
    Bag of words is a feature extraction method used to train machine learning models. It is one of the fundamental method to convert tokens into features.
- 
text-preprocessing :
- convert entire text into lowercase characters
 - remove all punctuations and unnecessary symbols
 
 - 
Vocabulary creation :
- from the text create a set of unique word
 
 - 
Text vectorization :
- create a matrix of features by assigning a separate column for each word, while each row corresponds to a sentence
 - assign 1 if the word is present in the sentence, and 0 if it is not present