I begin working on this project to work on the same dataset using both Python and R to have the same analysis and solve the same questions that come to my mind after exploring data I mainly focus on categorical data and try different things on it.
This project helps me understand the limitation and advantages of Python, R and show me what is better to do in each language.
-
EDA
-
Clean dataset.
-
Organize dataset columns types and use DateTime type.
-
Provide a chart for specific criteria and use a different type of charts to represent the same criteria in Python and R.
-
Work with categorical values [orgnize, filter, count, change values, plot values with there counts and specify max and percentage].
-
Some experiment things I tried to do but it is not fully done and probably will work on it more in the future :
NLP in Python:
I used LDA and Word Cloud also several packages for text mining. I apply LDA Model to data and was able to extract the most used words and plot the result in the appropriate format for the LDA model.
I could extract 10 most used words and use them to help search through the complaints.
NLP in R :
I tried to do the same concept I did in Python in R but I faced some difficulties and issues in NLP in R so I May work more on the NLP aspect more in the future.
But my idea is to use clustering is to extract the most used words in complaints and try to have a column containing the types for each complaint for example if a complaint is a slow internet then it will have the complaint type as the internet.
This can help the Comcast company classify complaints into different types and solve the problems in-depth and not only for this one complaint like if on a certain day all complaints about intent then there may be a problem with the company service.
- Comcast Telcom Complaints R Markdown.
- Comcast Telcom Complaints Python.