I received a word document named Tasks, which contains 3 questions that I needed to answer based on a dataset on Kaggle about Youtube videos statistic. There were 20 files, 10 files (10 countries) with video statistic in CSV format and 10 files in JSON with category_id that you could merge with video statistic file. I converted JSON files in CSV format and if I wanted to import it in VSC or Jupyter-Lab, some files needed the change in encoding to UTF-8 with BOOM. After that I started creating the code named TemplateForCountries.
First I imported the files and check for missing values, duplicates and minimum and maximum values (not included in a code). After that I merged the two datasets for every country on category_id. Then I selected only columns that would be useful for my analysis. I grouped data by category_name and merged categories that had less than 3% together as Other. I plotted two pie charts that showed popular categories in every country. That was needed two answer the first question.
Then I created the ratios of likes, dislikes and views and exported as Excel file. Using conditional formatting I colored the tables for every country so it looked was more clear at first sight what are liked and disliked categories.
The second code file named PopularChannels was used to answer the third question. I had to choose popular channels in most countries. I merged the 2 files for every country like before, Then grouped by channel_title and calculated the sum of likes, dislikes and views. I sorted the updated dataset by channel views and selected the top 50 of every country. I append each countries top 50 channels in a grouped dataset. I then grouped the total list by channel_title and sort the channels by size. This is how I got most popular channels on top and limited the number of channels to 20, because I checked and the 20 most popular channel was the last one being popular in more than half countries. I exported the file as Excel.
The file Mistakes in Pandas to avoid I created following a youtube video that shows commmon mistakes beginners makes in Pandas.