Click here: https://priyamaggarwal18.github.io/Leetcode_Problem_Dataset_Analysis/
Insights and trends from the 2021 LeetCode problem dataset for coding interview preparation and skill assessment.
- What is LeetCode 2021 Data Analysis?
- How the Analysis Works
- Resources
- Our Amazing Team
This project analyzes the LeetCode 2021 dataset containing 1825 problems. It covers features such as problem difficulty, acceptance rates, related companies, premium status, and user engagement metrics. Using Python libraries like Pandas, Matplotlib, and Seaborn, the analysis provides insights into problem distributions, company tagging, premium vs non-premium trends, and acceptance rates. These insights help learners understand LeetCode’s problem landscape and prepare more effectively for technical interviews.
Future plans include predictive modeling and enhanced visualization features.
- id: Problem ID number
- title: Problem name/title
- description: Detailed problem description
- is_premium: Premium subscription requirement (0 = no, 1 = yes)
- difficulty: Problem difficulty (Easy, Medium, Hard)
- acceptance_rate: Percent of correct submissions
- frequency: How often problem is attempted
- url: URL link to the problem page
- discuss_count: Number of user discussions/comments
- accepted: Total accepted submissions
- submissions: Total submission attempts
- companies: Companies tagged for asking this problem
- related_topics: Related topics/tags
- likes: Number of likes
- dislikes: Number of dislikes
- rating: Ratio of likes to (likes + dislikes)
- asked_by_faang: Flag if asked by Facebook, Apple, Amazon, Google, or Netflix
- similar_questions: List of similar problems with name, slug, and difficulty
The dataset is preprocessed by cleaning missing values and converting datatypes suitably. Visualizations such as difficulty distribution, acceptance rate histograms, company frequency bar graphs, and premium vs non-premium comparisons are generated using Seaborn and Matplotlib. Information extraction uses regular expressions for fields like companies and related topics. This analysis offers actionable insights by exploring patterns and relationships among the problem features.
- Dataset CSV: LeetCode 2021 Problem Dataset CSV (replace with your actual dataset link)
- Dataset Sources:
- Kaggle: LeetCode Problem Dataset
- Hugging Face: LeetCode Problem Set
- Python Libraries Used: Pandas, Numpy, Matplotlib, Seaborn, Re (Regex)
