This project scraped YouTube's big data to analyze collected information, aiming to predict video topics and build a content recommendation system.
Data was gathered using YouTube's APIs, including video titles, descriptions, tags, view counts, likes, etc. After preprocessing steps and exploratory data analysis (EDA) to gain initial insights, my team and I employed machine learning and deep learning models, as well as natural language processing (NLP) techniques, to analyze and predict video topics. Specifically, methods such as Count Vectorizer, Word2Vec, Naive Bayes, Linear Regression, and LSTM were utilized to determine the main topic based on the corresponding titles and descriptions.
Finally, we developed a recommendation system using techniques including TF-IDF and Logistic Regression to suggest relevant videos based on user preferences and past behaviors, using collaborative filtering method.