Add PoC validation for Multimodal AI Eval Framework (#1226) by Spark960 · Pull Request #1334 · foss42/apidash

Spark960 · 2026-03-12T18:11:11Z

PR Description

Hi again,

Following up on the idea discussion thread (#1226) and the feedback from my initial idea submission (#1136) where a PoC was requested, I went ahead and built a full Proof of Concept.

I just want to say, building this was incredibly fun and it genuinely showed me how cool and necessary this project is. Getting to see the live evaluation pipeline actually work end-to-end was so insane. This PR updates my original idea document with the PoC results and architecture findings.

Short Summary of everything

I created a decoupled FastAPI + React pipeline that runs lm-eval safely in background threads.
The coolest part: I built a proxy middleware layer that intercepts lm-eval payloads and cleanses/sanitizes them. This completely solves the vendor specific schema crashes we see with strict APIs like Gemini and Groq (e.g: Gemini instantly throwing a 400 Bad Request if it sees a seed parameter). This proves we can make this tool truly vendor neutral!
Piped real-time execution logs from Python directly to the frontend using Server-Sent Events (SSE).

PoC Repository: https://github.com/Spark960/ai-eval
Demo: A gif demonstration of the live evaluation pipeline streaming via SSE is available in the PoC readme.

I'd love to know your thoughts and critique too :)
@animator and @ashitaprasad

Related Issues

Follows up on docs: add GSoC 2026 Initial Idea Submission for Multimodal AI Eval Framework (Idea 2) #1136
Updates discussion in Discussion on #2 - Multimodal AI and Agent API Eval Framework #1226

Checklist

I have gone through the contributing guide
I have updated my branch and synced it with project main branch before making this PR
I am using the latest Flutter stable branch (run flutter upgrade and verify)
I have run the tests (flutter test) and all tests are passing (just updating idea doc)

Added/updated tests?

We encourage you to add relevant test cases.

Yes
No, and this is why: This PR only updates a Markdown document with GSoC PoC results. No application code was modified.

OS on which you have developed and tested the feature?

Windows
macOS
Linux

added PoC for Multimodal AI and Agent API Eval Framework idea

0741dd3

animator added the gsoc26-idea label Mar 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PoC validation for Multimodal AI Eval Framework (#1226)#1334

Add PoC validation for Multimodal AI Eval Framework (#1226)#1334
Spark960 wants to merge 1 commit intofoss42:mainfrom
Spark960:feat/ai-eval-poc

Spark960 commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Spark960 commented Mar 12, 2026

PR Description

Related Issues

Checklist

Added/updated tests?

OS on which you have developed and tested the feature?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants