https://github.com/hemanitekwani/MlProjects/blob/main/Bigmartsales.ipynb
This repository contains code for predicting sales in a BigMart store using machine learning. The project involves data preprocessing, exploratory data analysis, feature engineering, and training an XGBoost regression model.
The dataset used in this project is loaded from a CSV file named Train.csv
. It contains information about various features related to items in the store and their corresponding sales.
The dataset includes the following features:
Item_Identifier
Item_Weight
Item_Fat_Content
Item_Visibility
Item_Type
Item_MRP
Outlet_Identifier
Outlet_Establishment_Year
Outlet_Size
Outlet_Location_Type
Outlet_Type
Item_Outlet_Sales
The dataset is preprocessed in the following steps:
- Handling missing values in
Item_Weight
andOutlet_Size
columns - Exploratory data analysis using visualizations to understand the distribution of features
Label encoding is applied to categorical features to convert them into numerical format for model training.
An XGBoost Regressor model is trained to predict the Item_Outlet_Sales
:
- Data is split into training and testing sets.
- XGBoost Regressor is initialized and trained on the training data.
- R-squared values are calculated to evaluate model performance on both training and testing data.
- NumPy
- Pandas
- Matplotlib
- Seaborn
- scikit-learn
- XGBoost
- Clone the repository.
- Install the required dependencies.
- Run the provided Python script to predict BigMart sales.
https://github.com/hemanitekwani/MlProjects/blob/main/customer%20segmentation.ipynb
This project aims to perform customer segmentation using the K-Means clustering algorithm based on customers' annual income and spending score. The goal is to identify distinct customer groups for targeted marketing strategies.
The dataset used in this project contains customer information including gender, age, annual income, and spending score. It consists of a total of 200 data points.
To run this project, you will need the following dependencies:
- NumPy
- Pandas
- Matplotlib
- Seaborn
- scikit-learn
Follow these steps to use the project:
- Clone this repository to your local machine.
- Navigate to the project directory.
- Ensure that the required dependencies are installed.
- Run the Jupyter Notebook or Python script to perform customer segmentation.
The dataset is loaded and explored to understand its structure and characteristics. Initial analysis confirms the absence of missing values.
The K-Means algorithm is applied to segment the customers into distinct groups. The optimal number of clusters is determined using the within-cluster sum of squares (WCSS) method. The elbow point in the WCSS graph indicates the suitable number of clusters, which in this case is found to be 5.
Customer segments are visualized on a 2D scatter plot using their annual income and spending score. Each cluster is assigned a unique color, and cluster centroids are highlighted in cyan. This visualization helps in understanding the different customer groups created by the K-Means algorithm.
By utilizing K-Means clustering, customers are effectively grouped into segments based on their spending behaviors and annual income. This segmentation can guide targeted marketing campaigns and personalized customer interactions.
For more detailed information, refer to the Jupyter Notebook or Python script provided in this repository. Explore the code and visualize the results to gain valuable insights into customer segmentation!