Skip to content
#

model-benchmarking

Here are 11 public repositories matching this topic...

A modular deep learning evaluation framework for benchmarking multiple CNN architectures across varied optimization strategies and training configurations. Built for scalable experimentation and transferability to real-world image classification tasks.

  • Updated Jun 19, 2025
  • Jupyter Notebook

An in-depth comparison and building walkthrough of two sentiment classification models, ML vs. DL —a Logistic Regression and an LSTM model— both trained on the Sentiment140 dataset. The Jupyter notebook contains every step from data preprocessing, construction, training, evaluation, and comparing the strengths and shortcomings of each approach.

  • Updated Nov 13, 2025
  • Jupyter Notebook

A Streamlit web app that uses a Groq-powered LLM (Llama 3) to act as an impartial judge for evaluating and comparing two model outputs. Supports custom criteria, presets like creativity and brand tone, and returns structured scores, explanations, and a winner. Built end-to-end with Python, Groq API, and Streamlit.

  • Updated Nov 24, 2025
  • Python

Improve this page

Add a description, image, and links to the model-benchmarking topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the model-benchmarking topic, visit your repo's landing page and select "manage topics."

Learn more