|
| 1 | +--- |
| 2 | +title: "Batch LLM Evaluator" |
| 3 | +sidebarTitle: "Batch LLM Evaluator" |
| 4 | +description: "This example project evaluates multiple LLM models using the Vercel AI SDK and streams updates to the frontend using Trigger.dev Realtime." |
| 5 | +--- |
| 6 | + |
| 7 | +import RealtimeLearnMore from "/snippets/realtime-learn-more.mdx"; |
| 8 | + |
| 9 | +## Overview |
| 10 | + |
| 11 | +This demo is a full stack example that uses the following: |
| 12 | + |
| 13 | +- A [Next.js](https://nextjs.org/) app with [Prisma](https://www.prisma.io/) for the database. |
| 14 | +- Trigger.dev [Realtime](https://trigger.dev/launchweek/0/realtime) to stream updates to the frontend. |
| 15 | +- Work with multiple LLM models using the Vercel [AI SDK](https://sdk.vercel.ai/docs/introduction). (OpenAI, Anthropic, XAI) |
| 16 | +- Distribute tasks across multiple tasks using the new [`batch.triggerByTaskAndWait`](https://trigger.dev/docs/triggering#batch-triggerbytaskandwait) method. |
| 17 | + |
| 18 | +## GitHub repo |
| 19 | + |
| 20 | +<Card |
| 21 | + title="View the Batch LLM Evaluator repo" |
| 22 | + icon="GitHub" |
| 23 | + href="https://github.com/triggerdotdev/examples/tree/main/batch-llm-evaluator" |
| 24 | +> |
| 25 | + Click here to view the full code for this project in our examples repository on GitHub. You can |
| 26 | + fork it and use it as a starting point for your own project. |
| 27 | +</Card> |
| 28 | + |
| 29 | +## Video |
| 30 | + |
| 31 | +<video |
| 32 | + controls |
| 33 | + className="w-full aspect-video" |
| 34 | + src="https://content.trigger.dev/batch-llm-evaluator.mp4" |
| 35 | +></video> |
| 36 | + |
| 37 | +## Relevant code |
| 38 | + |
| 39 | +- View the Trigger.dev task code in the [src/trigger/batch.ts](https://github.com/triggerdotdev/examples/blob/main/batch-llm-evaluator/src/trigger/batch.ts) file. |
| 40 | +- The `evaluateModels` task uses the `batch.triggerByTaskAndWait` method to distribute the task to the different LLM models. |
| 41 | +- It then passes the results through to a `summarizeEvals` task that calculates some dummy "tags" for each LLM response. |
| 42 | +- We use a [useRealtimeRunsWithTag](https://trigger.dev/docs/frontend/react-hooks/realtime#userealtimerunswithtag) hook to subscribe to the different evaluation tasks runs in the [src/components/llm-evaluator.tsx](https://github.com/triggerdotdev/examples/blob/main/batch-llm-evaluator/src/components/llm-evaluator.tsx) file. |
| 43 | +- We then pass the relevant run down into three different components for the different models: |
| 44 | + - The `AnthropicEval` component: [src/components/evals/Anthropic.tsx](https://github.com/triggerdotdev/examples/blob/main/batch-llm-evaluator/src/components/evals/Anthropic.tsx) |
| 45 | + - The `XAIEval` component: [src/components/evals/XAI.tsx](https://github.com/triggerdotdev/examples/blob/main/batch-llm-evaluator/src/components/evals/XAI.tsx) |
| 46 | + - The `OpenAIEval` component: [src/components/evals/OpenAI.tsx](https://github.com/triggerdotdev/examples/blob/main/batch-llm-evaluator/src/components/evals/OpenAI.tsx) |
| 47 | +- Each of these components then uses [useRealtimeRunWithStreams](https://trigger.dev/docs/frontend/react-hooks/realtime#userealtimerunwithstreams) to subscribe to the different LLM responses. |
| 48 | + |
| 49 | +<RealtimeLearnMore /> |
0 commit comments