Skip to content
View TheMrityunjayPathak's full-sized avatar

Block or report TheMrityunjayPathak

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

 

About

Hello! My name is Mrityunjay Pathak.

I'm a data scientist who enjoys building real-world, end-to-end systems.

I love creating projects that don't just stay in notebooks, but are deployed online where people can actually use them.

Some projects I've worked on :

  • AutoIQ : Car Price Prediction
    • Built a car price prediction system with FastAPI and Docker, trained on 2,800+ scraped car listings from Cars24.
    • Deployed an interactive HTML/CSS/JS application on GitHub Pages that fetches real-time predictions via the API.
  • Dashly : Live Sales Dashboard
    • Designed a live Power BI dashboard connected to a Neon PostgreSQL database, containing 50,000+ sales records.
    • Automated an ETL pipeline with GitHub Actions to keep the dashboard continuously updated with real-time insights.
  • Pickify : Movie Recommender System
    • Developed a content-based movie recommender system using metadata from 5,000+ movies.
    • Integrated the TMDB API to fetch and display movie posters dynamically, for a personalized user experience.

Tools and Technologies I've worked with :

  • Programming Language : Python, SQL
  • Libraries : NumPy, Pandas, Matplotlib, Seaborn, Plotly
  • Machine Learning : Scikit-learn
  • Database : MySQL
  • BI Tool : Power BI, Excel
  • Web Framework : FastAPI
  • Containerization : Docker
  • Version Control : Git
  • Automation : GitHub Actions
  • Shell Scripting : Bash

I'm currently seeking opportunities as a Data Scientist or a Machine Learning Engineer, where I can contribute to building data-driven solutions that create measurable business impact.

If you're looking for someone who's eager to learn, collaborate and deliver results, I'd love to connect and explore how I can add value to your team.

Get in Touch

Kaggle  ✦  LinkedIn  ✦  GitHub  ✦  Medium  ✦  Portfolio

Projects

AutoIQ : Car Price Prediction

    

➔ Problem

  • In the used car market, buyers and sellers often struggle to determine a fair price for their vehicles.
  • This project aims to provide an accurate and transparent pricing for used cars by analyzing real-world data.

➔ Solution

  • Built and deployed an end-to-end machine learning pipeline to predict used car prices from real-world data.
  • Collected and cleaned 2,800+ used car records from Cars24 using Selenium and BeautifulSoup.
  • Optimized memory consumption of the dataset by downcasting data types and converting to Parquet format.
  • Trained models with Scikit-learn Pipelines & ColumnTransformer to avoid data leakage.
  • Deployed the machine learning model as an API using FastAPI on Render.
  • Built a HTML/CSS/JS application hosted on GitHub Pages to interact with the API and display predictions in real-time.
  • Containerized the entire application using Docker and pushed to Docker Hub for reproducibility.

➔ Results

  • Reduced dataset memory usage by 90% using optimized storage techniques.
  • Achieved a 30% lower MAE and a 12% higher R2-score compared to the baseline model.
  • Improved model stability by 70%, ensuring more stable and reliable predictions.

➔ Impact

  • Helps car owners quickly find the right selling price for their vehicles based on real-world data.
  • Makes it easier for buyers to know if a car is fairly priced before making a purchase.

Dashly : Live Sales Dashboard

  

➔ Problem

  • Quick Buy is a leading superstore operating across the United States.
  • It manages thousands of product transactions daily across multiple regions.
  • The store's operations relied on manual spreadsheets and SQL queries to track business performance.
  • As a result, decision-making was slowed down and made it harder to identify growth opportunities.

➔ Solution

  • Designed a fully automated ETL pipeline using Python, SQLAlchemy and GitHub Actions for seamless daily data updates.
  • Built custom Python ETL scripts to extract, transform and load over 50,000+ sales records into a Neon PostgreSQL database.
  • Automated daily data generation (~100 new transactions daily) to simulate real-time sales activity.
  • Integrated Power BI directly with the database, enabling real-time auto-refreshing dashboard without manual uploads.

➔ Key Insights

  • Standard Class drives ~60% of sales (~₹5.1M) and profit (~₹897K), making it the most profitable and preferred shipping mode.
  • Consumer Segment generates ~50% of revenue (~₹4.26M) and profit (~₹757K), highlighting it as the primary customer base.
  • Q4 (Oct–Dec) delivers ~27% of yearly revenue, highlighting strong seasonal demand, ideal for marketing and promotions.
  • Paper, Binders and Phones emerge as top-performing sub-categories, together making up ~45% of total revenue.
  • West and East regions lead the market with ~58% of total sales, while the South region with ~19% shows room for growth.
  • Top 5 States (CA, NY, TX, PA, OH) contribute ~54% of sales, with CA alone driving ~21%, showing strong regional concentration.

➔ Impact

  • Enabled real-time insights through Power BI dashboards with automatic daily refresh.
  • Reduced daily data update time from hours to under a minute (average ~40 sec) using GitHub Actions.
  • Delivered a reliable, low-latency, fully automated data pipeline with zero manual intervention.
  • Achieved 100% workflow reliability as recorded in the GitHub Actions, with zero pipeline failures since deployment.

Pickify : Movie Recommender System

  

➔ Problem

  • With the rise of streaming services, viewers now have access to thousands of movies across platforms.
  • As a result, many viewers spend more time browsing than actually watching.
  • This problem can lead to frustration, lower satisfaction and less time spent on the platform.
  • Ultimately, this impacts both user experience and business performance.

➔ Solution

  • Built a content-based movie recommender system trained on 5,000+ movie metadata records.
  • Generated the top 5 similar titles for any selected movie in under 3 seconds.
  • Integrated the TMDB API to dynamically fetch and display movie posters, enhancing user experience.
  • Deployed the system as a Streamlit web app, used by 100+ users to discover personalized movie suggestions.

➔ Impact

  • If this system gets scaled and integrated with a streaming service, this could :
    • Reduce the time users spend choosing what to watch.
    • Increase user engagement, watch time and customer satisfaction.
    • Help streaming platforms retain users by offering better personalized content.

Netflix Data Analysis

  

➔ Problem Statement

  • To analyze Netflix content data, uncovering valuable insights into how the platform evolves over time.

➔ Some Key Findings

  • Cleaned and analyzed a dataset of 8,000+ Netflix Movies and TV Shows.
  • More than 60% of the content on Netflix is rated for mature audiences.
    • Suggests that Netflix targets adult viewers to boost engagement and retention.
  • More than 25% of the Movies and TV Shows were released on 1st day of the month.
    • Shows a consistent release schedule, likely aligned with subscription renewal cycles.
  • More than 40% of the content on Netflix is exclusive to United States.
    • Shows a strong focus on U.S. market and content availability by location.
  • More than 20% of the content on Netflix falls under the "Drama" genre.
    • Confirms that "Drama" is a key part of Netflix's content library.
  • More than 23% of the content on Netflix was released in 2019 alone.
    • Indicates a major content push that year, possibly tied to growth or user acquisition efforts.

Supermarket Sales Analysis

  

➔ Problem Statement

  • To analyze Supermarket Sales data, identifying key factors for improving profitability and operational efficiency.

➔ Some Key Findings

  • Analyzed purchasing patterns of 9,000+ customers of a Supermarket.
  • More than 15% of the products sold were Snacks.
    • Shows that Snacks are a convenient choice and a major source of revenue.
  • More than 32% of total sales came from the West region of the Supermarket.
    • Suggests that West region is a strong performing area as compared to others.
  • Health and Soft drinks were the most profitable sub-categories in Beverages.
    • Shows that both type of drink options perform well among customers.
  • November was the most profitable month contributing about 15% of the total annual profits.
    • Makes it an ideal time for running promotions and special offers.

Certificates

  

Blogs

  

Pinned Loading

  1. TheMrityunjayPathak.github.io TheMrityunjayPathak.github.io Public

    Portfolio Website deployed on GitHub Pages

    HTML 1

  2. AutoIQ AutoIQ Public

    Thinking of buying or selling, Start with AutoIQ

    Jupyter Notebook

  3. Dashly Dashly Public

    Get smarter insights, right when you need them

    Python

  4. Pickify Pickify Public

    Smart movie picks, based on what you love

    Jupyter Notebook

  5. Netflix-Data-Analysis Netflix-Data-Analysis Public

    Netflix Data Analysis

    Jupyter Notebook

  6. Supermarket-Sales-Analysis Supermarket-Sales-Analysis Public

    Supermarket Sales Analysis

    Jupyter Notebook