Repository to track all the work of the capstone project
Created by Manojkumar Parmar for the DataScience capstone project.
This is the training data to get you started that will be the basis for most of the capstone. You must download the data from the Coursera site and not from external websites to start.
Your original exploration of the data and modeling steps will be performed on this data set. Later in the capstone, if you find additional data sets that may be useful for building your model you may use them.
- Obtaining the data - Can you download the data and load/manipulate it in R?
- Familiarizing yourself with NLP and text mining - Learn about the basics of natural language processing and how it relates to the data science process you have learned in the Data Science Specialization.
- What do the data look like?
- Where do the data come from?
- Can you think of any other data sources that might help you in this project?
- What are the common steps in natural language processing?
- What are some common issues in the analysis of text data?
- What is the relationship between NLP and the concepts you have learned in the Specialization?