This project focuses on exploring and analyzing the Goodreads Best Books dataset, which contains information about thousands of books, including ratings, popularity, genres, authors, and publishers. The goal is to clean and preprocess the data, handle missing values, remove unnecessary columns, and extract valuable insights that reflect readers’ interests and publishing trends. Additionally, I analyze the distribution of numeric features (such as ratings, votes, and pages) to better understand reading behaviors and book characteristics.
- The main objectives of this analysis are to:
- Prepare the data for analysis through cleaning and preprocessing.
- Explore relationships between variables such as ratings, popularity, genres, and publishers.-
- Visualize key trends to answer meaningful business questions.
- What price are people most willing to pay for a book, especially for books with higher BBE scores?
- How many pages are readers typically willing to read, based on interest or popularity?
- What are the top 5 most frequent languages, authors, genres, awards, and publishers?
- How many books are part of a series versus standalone books?
- Which books rank in the top 10 based on popularity, weighted rating, number of ratings, and votes?
- Which genres and publishers have the highest number of authors in the dataset?
- Which author has the highest popularity, weighted rating, votes, number of ratings, and five-star reviews?
- Which publishers contribute the most to the overall BBE score and have the greatest influence in popularity and ratings?
- What is the most common publisher within each genre and for each award?
- Which books and publishers have received the most awards?
- Which genres correlate the most with number of ratings, BBE score, and BBE votes?
- Python (Pandas, NumPy, Matplotlib, Seaborn)
- Data Cleaning & Preprocessing
- Exploratory Data Analysis (EDA)
- Data Visualization
- Power BI Dashboard Creation