This project is an e-learning course recommendation system developed using Python. The system reads course data, preprocesses it, and recommends similar courses based on a given input course. The recommendation is based on text vectorization and similarity metrics.
To run this project locally, you'll need to have Python installed. The required libraries can be installed using the following command:
pip install numpy pandas matplotlib scikit-learn nltk
- Clone the repository.
- Ensure you have the required dataset (Coursera.csv) in the correct path (/content/Coursera.csv). -Run the Backend.ipynb Jupyter notebook.
Technical Methods Used
- Data Cleaning and Preprocessing String Replacement: Removing spaces and special characters from course names, descriptions, and skills to standardize the text data.
- Feature Engineering Tag Creation: Combining relevant columns (Course Name, Difficulty Level, Course Description, and Skills) into a single tags column to create a comprehensive feature for each course.
- Text Vectorization CountVectorizer: Converting the tags text data into numerical vectors using the CountVectorizer from scikit-learn. This helps in transforming the text into a format suitable for machine learning algorithms.
- Text Normalization Stemming: Using the Porter Stemmer from the nltk library to reduce words to their root form. This helps in minimizing the feature space and improving the accuracy of the similarity calculations.
- Similarity Calculation Cosine Similarity: Calculating the cosine similarity between the course vectors to measure the similarity between different courses. This metric is used to find and recommend courses that are similar to a given input course.
- Model Training-->> testing
- React js
- Login / SignUp
- Dashboard
- npm i - to install the dependencies
- npm run dev - to start the server