Skip to content

Exploration and modeling of language with focus on CMI and Perplexity obtained from Unigram, Bigram and Trigram simulation on Codemix data

Notifications You must be signed in to change notification settings

woodminus/NLP-Processing-X

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Natural Language Processing

Dataset used: Twitter codemix data.

  1. Language Modelling:
  • Calculated Trigram, Bigram, Unigram perplexities on codemix data.
  1. CMI vs Perplexity:
  • Calculated Code Mixing Index(CMI) for each tweet and seperated tweets into 10 sets based on the CMI values. For each set we found perplexity, and found the relation between CMI and Perplexity on the data we collected.

  • Each folder has README.md inside describing what we have done.

Contributors:

M R Abhishek and K Vagdevi

About

Exploration and modeling of language with focus on CMI and Perplexity obtained from Unigram, Bigram and Trigram simulation on Codemix data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages