Mrityunjay Pathak TheMrityunjayPathak

About

Hello! My name is Mrityunjay Pathak.

I'm a data scientist who enjoys building real-world, end-to-end systems.

I love creating projects that don't just stay in notebooks, but are deployed online where people can actually use them.

Some projects I've worked on :

AutoIQ : Car Price Prediction

Built a car price prediction system with FastAPI and Docker, trained on 2,800+ scraped car listings from Cars24.

Deployed an interactive HTML/CSS/JS application on GitHub Pages that fetches real-time predictions via the API.

Dashly : Live Sales Dashboard

Designed a live Power BI dashboard connected to a Neon PostgreSQL database, containing 50,000+ sales records.

Automated an ETL pipeline with GitHub Actions to keep the dashboard continuously updated with real-time insights.

Pickify : Movie Recommender System

Developed a content-based movie recommender system using metadata from 5,000+ movies.

Integrated the TMDB API to fetch and display movie posters dynamically, for a personalized user experience.

Tools and Technologies I've worked with :

Programming Language : Python, SQL

Libraries : NumPy, Pandas, Matplotlib, Seaborn, Plotly

Machine Learning : Scikit-learn

Database : MySQL

BI Tool : Power BI, Excel

Web Framework : FastAPI

Containerization : Docker

Version Control : Git

Automation : GitHub Actions

Shell Scripting : Bash

I'm currently seeking opportunities as a Data Scientist or a Machine Learning Engineer, where I can contribute to building data-driven solutions that create measurable business impact.

If you're looking for someone who's eager to learn, collaborate and deliver results, I'd love to connect and explore how I can add value to your team.

Get in Touch

Kaggle  ✦  LinkedIn  ✦  GitHub  ✦  Medium  ✦  Portfolio

Projects

AutoIQ : Car Price Prediction



➔ Problem

In the used car market, buyers and sellers often struggle to determine a fair price for their vehicles.

This project aims to provide an accurate and transparent pricing for used cars by analyzing real-world data.

➔ Solution

Built and deployed an end-to-end machine learning pipeline to predict used car prices from real-world data.

Collected and cleaned 2,800+ used car records from Cars24 using Selenium and BeautifulSoup.

Optimized memory consumption of the dataset by downcasting data types and converting to Parquet format.

Trained models with Scikit-learn Pipelines & ColumnTransformer to avoid data leakage.

Deployed the machine learning model as an API using FastAPI on Render.

Built a HTML/CSS/JS application hosted on GitHub Pages to interact with the API and display predictions in real-time.

Containerized the entire application using Docker and pushed to Docker Hub for reproducibility.

➔ Results

Reduced dataset memory usage by 90% using optimized storage techniques.

Achieved a 30% lower MAE and a 12% higher R2-score compared to the baseline model.

Improved model stability by 70%, ensuring more stable and reliable predictions.

➔ Impact

Helps car owners quickly find the right selling price for their vehicles based on real-world data.

Makes it easier for buyers to know if a car is fairly priced before making a purchase.

Dashly : Live Sales Dashboard



➔ Problem

Quick Buy is a leading superstore operating across the United States.

It manages thousands of product transactions daily across multiple regions.

The store's operations relied on manual spreadsheets and SQL queries to track business performance.

As a result, decision-making was slowed down and made it harder to identify growth opportunities.

➔ Solution

Designed a fully automated ETL pipeline using Python, SQLAlchemy and GitHub Actions for seamless daily data updates.

Built custom Python ETL scripts to extract, transform and load over 50,000+ sales records into a Neon PostgreSQL database.

Automated daily data generation (~100 new transactions daily) to simulate real-time sales activity.

Integrated Power BI directly with the database, enabling real-time auto-refreshing dashboard without manual uploads.

➔ Key Insights

Standard Class drives ~60% of sales (~₹5.1M) and profit (~₹897K), making it the most profitable and preferred shipping mode.

Consumer Segment generates ~50% of revenue (~₹4.26M) and profit (~₹757K), highlighting it as the primary customer base.

Q4 (Oct–Dec) delivers ~27% of yearly revenue, highlighting strong seasonal demand, ideal for marketing and promotions.

Paper, Binders and Phones emerge as top-performing sub-categories, together making up ~45% of total revenue.

West and East regions lead the market with ~58% of total sales, while the South region with ~19% shows room for growth.

Top 5 States (CA, NY, TX, PA, OH) contribute ~54% of sales, with CA alone driving ~21%, showing strong regional concentration.

➔ Impact

Enabled real-time insights through Power BI dashboards with automatic daily refresh.

Reduced daily data update time from hours to under a minute (average ~40 sec) using GitHub Actions.

Delivered a reliable, low-latency, fully automated data pipeline with zero manual intervention.

Achieved 100% workflow reliability as recorded in the GitHub Actions, with zero pipeline failures since deployment.

Pickify : Movie Recommender System



➔ Problem

With the rise of streaming services, viewers now have access to thousands of movies across platforms.

As a result, many viewers spend more time browsing than actually watching.

This problem can lead to frustration, lower satisfaction and less time spent on the platform.

Ultimately, this impacts both user experience and business performance.

➔ Solution

Built a content-based movie recommender system trained on 5,000+ movie metadata records.

Generated the top 5 similar titles for any selected movie in under 3 seconds.

Integrated the TMDB API to dynamically fetch and display movie posters, enhancing user experience.

Deployed the system as a Streamlit web app, used by 100+ users to discover personalized movie suggestions.

➔ Impact

If this system gets scaled and integrated with a streaming service, this could :

Reduce the time users spend choosing what to watch.

Increase user engagement, watch time and customer satisfaction.

Help streaming platforms retain users by offering better personalized content.

Netflix Data Analysis



➔ Problem Statement

To analyze Netflix content data, uncovering valuable insights into how the platform evolves over time.

➔ Some Key Findings

Cleaned and analyzed a dataset of 8,000+ Netflix Movies and TV Shows.

More than 60% of the content on Netflix is rated for mature audiences.

Suggests that Netflix targets adult viewers to boost engagement and retention.

More than 25% of the Movies and TV Shows were released on 1st day of the month.

Shows a consistent release schedule, likely aligned with subscription renewal cycles.

More than 40% of the content on Netflix is exclusive to United States.

Shows a strong focus on U.S. market and content availability by location.

More than 20% of the content on Netflix falls under the "Drama" genre.

Confirms that "Drama" is a key part of Netflix's content library.

More than 23% of the content on Netflix was released in 2019 alone.

Indicates a major content push that year, possibly tied to growth or user acquisition efforts.

Supermarket Sales Analysis



➔ Problem Statement

To analyze Supermarket Sales data, identifying key factors for improving profitability and operational efficiency.

➔ Some Key Findings

Analyzed purchasing patterns of 9,000+ customers of a Supermarket.

More than 15% of the products sold were Snacks.

Shows that Snacks are a convenient choice and a major source of revenue.

More than 32% of total sales came from the West region of the Supermarket.

Suggests that West region is a strong performing area as compared to others.

Health and Soft drinks were the most profitable sub-categories in Beverages.

Shows that both type of drink options perform well among customers.

November was the most profitable month contributing about 15% of the total annual profits.

Makes it an ideal time for running promotions and special offers.

Certificates



Blogs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mrityunjay Pathak TheMrityunjayPathak

Block or report TheMrityunjayPathak

About

Get in Touch

Projects

AutoIQ : Car Price Prediction

Dashly : Live Sales Dashboard

Pickify : Movie Recommender System

Netflix Data Analysis

Supermarket Sales Analysis

Certificates

Blogs

Pinned Loading

Uh oh!