Comparing Commercial and Open-Source Language Models for Sustainable AI
This repository presents the ELO2 โ GREEN AI Project, developed within the MIT Emerging Talent โ AI & ML Program (2025). The work investigates the technical performance, sustainability traits, and human-perceived quality of open-source language models compared to commercial systems.
To what extent can open-source LLMs provide competitive output quality while operating at significantly lower environmental cost?
Large commercial LLMs deliver strong performance but demand substantial compute and energy. This project examines whether small, accessible, and environmentally efficient open-source modelsโespecially when enhanced with retrieval and refinement pipelinesโcan offer practical alternatives for everyday tasks.
The study evaluates several open-source model groups:
- Quantized Model: Mistral-7B (GGUF)
- Distilled Model: LaMini-Flan-T5-248M
- Small Models: Qwen, Gemma
- Enhanced Pipelines (applied to all model families):
- RAG (Retrieval-Augmented Generation)
- Recursive Editing
- includes AI-based critique and iterative refinement
These configurations serve as the optimized open-source setups used in the comparison against commercial models.
Evaluation tasks include:
- summarization
- factual reasoning
- paraphrasing
- short creative writing
- instruction following
- question answering
A targeted excerpt from the Apollo-11 mission transcripts served as the central reference text for all evaluation tasks. All prompts were constructed directly from this shared material. Using a single, consistent source ensured that every model was tested under identical informational conditions, allowing clear and fair comparison of output quality and relevance.
Retrieval-Augmented Generation (RAG) was applied to multiple model families. The pipeline includes:
- document indexing
- dense similarity retrieval
- context injection through prompt augmentation
- answer synthesis using guidance prompts
RAG improved factual grounding in nearly all models.
A lightweight iterative refinement procedure was implemented:
-
Draft Generation:
The primary model produces an initial output. -
AI-Based Critique:
A secondary SLM evaluates clarity, accuracy, faithfulness and relevance. -
Refinement Step:
A revision prompt integrates critique and generates an improved text. -
Stopping Condition:
The cycle ends after a fixed number of iterations or when critique stabilizes.
This approach allowed weaker SLMs to yield higher-quality results without relying on large models.
Environmental footprint data was captured with CodeCarbon, recording:
- CPU/GPU energy usage
- Carbon emissions
- PUE-adjusted overhead
These measurements enabled comparison with published metrics for commercial LLMs.
A structured Google Form experiment collected:
- source identification (commercial vs. open-source)
- quality ratings on accuracy, faithfulness, relevance, and clarity
(1โ5 scale)
Outputs were randomized and anonymized to avoid bias. This provided a perception-based counterpart to technical evaluation.
....
....
- FINDING1.....
- FINDING2.....
- FINDING3.....
- FINDING4.....
- Evaluate additional open-source model families across diverse tasks
- Test optimized pipelines in specialized domains (medical, legal, technical writing)
- Track carbon footprint across full lifecycle (training to deployment)
- Conduct ablation studies isolating RAG vs. recursive editing contributions
The research findings will be shared through formats designed for different audiences and purposes:
A comprehensive research article will document the complete experimental design, statistical analysis, and implications.
๐ View Aticle
An executive presentation provides a visual overview of the research question, methodology, and key findings without requiring deep technical background.
๐ View Presentation
A public evaluation study invites participation in assessing AI-generated texts. This crowdsourced data forms a critical component of the research.
๐ Participate in Study
All materials (dataset, prompts, model outputs, evaluation scripts, and carbon tracking logs) are publicly available in this repository.
๐ Browse Repository
Special thanks to the MIT Emerging Talent Program for their guidance and feedback throughout the project.

