Skip to content

Built with Python, Pandas, and Scikit-learn, this machine learning project uses K-Means to cluster website users by behavior. It reveals patterns in engagement and bounce, helping drive data-informed decisions.

License

Notifications You must be signed in to change notification settings

AdityakumarDA/kmeans-web-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kmeans-web-analytics

Built with Python, Pandas, and Scikit-learn, this machine learning project uses K-Means to cluster website users by behavior. It reveals patterns in engagement and bounce, helping drive data-informed decisions.


Table of Contents


Key Features and Benefits

  • User Segmentation: Divides website users into distinct clusters based on their behavior patterns.
  • Behavioral Insights: Identifies common engagement and bounce patterns within each cluster.
  • Data-Driven Decisions: Enables data-informed decisions regarding website optimization, marketing strategies, and user experience improvements.
  • K-Means Clustering: Employs the K-Means algorithm to effectively group users with similar behaviors.
  • Python-Based: Leverages the power and flexibility of Python for data analysis and machine learning.

Prerequisites and Dependencies

Before running this project, ensure you have the following installed:

  • Python (3.6 or higher)
  • Pandas: pip install pandas
  • Scikit-learn: pip install scikit-learn
  • Jupyter Notebook (Optional): pip install notebook

Installation and Setup Instructions

  1. Clone the Repository:

    git clone https://github.com/AdityakumarDA/kmeans-web-analytics.git
    cd kmeans-web-analytics
  2. Install Dependencies:

    It is recommended to create a virtual environment for this project.

    # Create a virtual environment (optional)
    python3 -m venv venv
    source venv/bin/activate  # On Linux/macOS
    # venv\Scripts\activate  # On Windows
    
    # Install the required packages
    pip install pandas scikit-learn notebook
  3. Download the data Ensure that the website_traffic_data.csv is downloaded and placed in the project directory


Usage Examples and API Documentation

This project primarily consists of a Jupyter Notebook (ML_project.ipynb) that demonstrates the usage of K-Means clustering.

  1. Run the Notebook:

    jupyter notebook ML_project.ipynb
  2. Follow the steps within the notebook: The notebook guides you through data loading, preprocessing, K-Means model training, and cluster analysis. It leverages Pandas and Scikit-learn functions directly.

    Example snippet (from notebook concept):

    import pandas as pd
    from sklearn.cluster import KMeans
    from sklearn.preprocessing import StandardScaler
    
    # Load the data
    data = pd.read_csv("website_traffic_data.csv")
    
    # Select features (e.g., 'engagement', 'bounce_rate')
    features = ['engagement', 'bounce_rate']
    X = data[features]
    
    # Standardize the features
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    
    # Apply K-Means clustering
    kmeans = KMeans(n_clusters=3, random_state=42)  # Example: 3 clusters
    data['cluster'] = kmeans.fit_predict(X_scaled)
    
    # Analyze the clusters
    print(data.groupby('cluster')[features].mean())

Configuration Options

The primary configurable option is the number of clusters (n_clusters) in the K-Means algorithm. This can be adjusted within the ML_project.ipynb notebook. Experiment with different values to find the optimal number of clusters for your dataset. Also the features that are used for clustering are configurable.

kmeans = KMeans(n_clusters=3, random_state=42) # change n_clusters

Contributing Guidelines

Contributions are welcome! To contribute to this project:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Make your changes and commit them with clear, descriptive messages.
  4. Submit a pull request.

Please ensure your code adheres to Python coding standards and includes appropriate documentation.


License Information

No license specified. All rights reserved by AdityakumarDA.


Acknowledgments


Project Structure

kmeans-web-analytics/
├── ML_project.ipynb
├── website_traffic_data.csv
├── README.md
└── images/
    ├── trafficcost_vs_Search_volume.png
    ├── elbow_plot.png
    └── cluster_scatter.png

Visual Output Snapshots

1. 🔍 Search Volume vs Traffic Cost

Search Volume vs Traffic Cost

This scatter plot visualizes how Search Volume impacts the Traffic Cost for various website keywords or landing pages. It helps identify outliers — e.g., terms with exceptionally high traffic costs or volume. This can assist in budget optimization for paid campaigns or SEO strategy.


2. 💡 Elbow Method to Determine Optimal Clusters

Elbow Plot

The Elbow Method helps us decide the optimal number of clusters (n_clusters) for K-Means. It plots the number of clusters vs the clustering inertia (error). The 'elbow point' (highlighted with a red star) indicates the most efficient number of clusters — beyond which performance gain diminishes. In this project, 2 clusters were optimal.


3. 📊 K-Means Cluster Scatter Plot (Search Volume vs Traffic)

Cluster Scatter Plot

This plot displays final K-Means clustering results, where:

  • Each point is a data sample (a keyword or page).
  • Different colors indicate different user segments (clusters).
  • Red stars mark the centroids (mean position of each cluster).

It provides intuitive insights into user groupings like high-volume, low-cost vs low-volume, high-cost clusters. This is essential for personalized targeting and marketing strategies.


Future Enhancements

  • Add features like session duration, pages per session
  • Use silhouette score for better cluster selection
  • Deploy via Streamlit/Flask for interactivity
  • Add time-series or location-based segmentation

About Me

I'm Aditya Rajput, a data analyst passionate about storytelling with data, unsupervised learning, and real-world analytics.

If you liked this project, please ⭐ the repo!

About

Built with Python, Pandas, and Scikit-learn, this machine learning project uses K-Means to cluster website users by behavior. It reveals patterns in engagement and bounce, helping drive data-informed decisions.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published