Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
README.md		README.md
deasciifier.py		deasciifier.py
emoji.py		emoji.py
filedeneme.py		filedeneme.py
normalization.py		normalization.py
normalization_file.py		normalization_file.py
turkce_sozluk2.txt		turkce_sozluk2.txt

Repository files navigation

Twitter-Data-Normalization

Softwares

Python 3.4

Sample Dataset

http://www.kemik.yildiz.edu.tr/data/File/2milyon_tweet.rar

Projects Steps

Analyzing the dataset and finding general mistakes when users send tweets ✅
To tokenize with NLTK ✅
Analyzing the words and

identified and correct emphasize words ✅
adding the forgotten letters in words ✅
correct Turkish sms words ✅
identify emojis ✅
identify mentions ✅
identify hashtags ✅
identify urls ✅
identify punctions ✅
identify symbols ✅
correction accent marks ✅
correction extra whitespaces ✅
making deascifiier ✅

Testing results ✅

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%