Semi-Structured Data Processing with NoSQL Database Server MongoDB Collecting Social Media Data from Twitter Real-time Data Stream and Storing and Retrieving to Process from a Semi-Structured Database Server MongoDB
In this project , there are 3 main files for 3 Takes :
Task 1-Data collection: lab3_twitter/scrap_twitter_sab.py
-Automating scraping twitter API V2 using tweepy Python Package
Task 2- Data Storing: lab3_twitter/ MongoInsert_sab.py
-6k tweet text_data of JSON data is stored in MongoDb.
Task 3- Text Analysis : lab3_twitter/ TextAnalysis_EC.py
-With Wordnet help identified bigrams, trigrams and polysemy from stopword removed tokens.
Task 4- Sentimental Analysis Added: lab3_twitter/ TextAnalysis_EC.py
Twitter data directory: Scraped data is stored as json and only the full Tweet Text is converted as CSV.
Result directory:all the bigram.csv,Top10Words.csv, trigram.csv, and polysemy detection are stored as CSV file.
Config.ini: Twitter api key ,api token and secret for communicating with Twitter API v1 & v2 support.
Requirements.text : all the frameworks , wordnet , header files used in this project information are available.
I have requested for Elevated Access in twitter developer portal to scrap 6k User tweet based on topic : Russia_Ukraine_War