Our Project was to perform Sentimental analysis on Corona tweet dataset. Sentiment analysis is the process of detecting positive or negative sentiment in text. It’s often used by businesses to detect sentiment in social data, gauge brand reputation, and understand customers. The applications of sentiment analysis are endless and can be applied to any industry, from finance and retail to hospitality and technology. Our Dataset has 41157 rows and 6 columns. We were supposed to use corona tweet dataset which has already categorised tweets into various sentimental categories. Using this, we developed model which can further be used to classify text on basis of sentiments. While Data cleaning and wrangling we get know that, location column had 20.8% null values which was the most of all. This was treated by filling null values with Mode value of column. We did EDA on data and gained some meaningful insights which are discussed in presentation. For Model preparation we used CountVectorizer approach as well as Tf-Idf approach. I personally, am more comfortable with Tf-Idf approach. Data was needed to be pre- processed before using in model and so stemming, lemmatisation, removal of stop words etc were done. I faced a bit of problem in defining functions of pre- processing because it needs to be done in proper sequence otherwise it might affect other data. We used classification report, to measure our overall model as it presents all metric scores in very good way. After this whole project I feel confident in performing Sentimental Analysis as well as text processing.
-
Notifications
You must be signed in to change notification settings - Fork 1
SajalSinha/Corona_tweet_sentimental_Analysis
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published