Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign of Manual Evaluations #134

Merged
merged 52 commits into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
b178105
Pagination bug
karthikscale3 Apr 4, 2024
af47e53
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 Apr 4, 2024
cc6ab77
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 Apr 4, 2024
677b1aa
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 Apr 4, 2024
f996899
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 Apr 4, 2024
0e963e2
Bug fix
karthikscale3 Apr 5, 2024
f3a8a23
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 Apr 5, 2024
3c353a5
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 Apr 5, 2024
94f0fb1
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 Apr 7, 2024
f22f397
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 Apr 8, 2024
5023d1f
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 Apr 9, 2024
4178d5c
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 Apr 11, 2024
4fe4108
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 Apr 13, 2024
784f09c
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 Apr 14, 2024
42f1128
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 Apr 19, 2024
0c8e92b
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 Apr 24, 2024
28227d9
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 Apr 28, 2024
0cec8be
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 Apr 28, 2024
eae354d
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 7, 2024
7723147
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 7, 2024
31bfd5f
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 7, 2024
1b9c986
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 9, 2024
31f50e6
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 9, 2024
4e30354
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 9, 2024
9cbb929
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 9, 2024
bee3df8
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 9, 2024
186f82b
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 13, 2024
2099f1d
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 13, 2024
047f9b6
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 13, 2024
194c65e
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 17, 2024
63eb067
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 20, 2024
3a44486
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 21, 2024
9ad8c44
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 21, 2024
5605411
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 21, 2024
eb53696
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 21, 2024
1d29604
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 22, 2024
020a3a9
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 22, 2024
b9b73db
Merge branch 'development' of github.com:Scale3-Labs/langtrace into d…
karthikscale3 May 22, 2024
040b283
Schema changes for evals
karthikscale3 May 24, 2024
9c15a82
Merge branch 'development' of github.com:Scale3-Labs/langtrace into k…
karthikscale3 May 24, 2024
5779e11
Fix APIs for evals
karthikscale3 May 24, 2024
b1dff70
Minor bugfixes
karthikscale3 May 24, 2024
a601f71
Bugfix
karthikscale3 May 24, 2024
c2d2882
Evaluation changes
karthikscale3 Jun 3, 2024
af8c818
Fix merge conflict
karthikscale3 Jun 3, 2024
f7016a7
Manual eval simplification
karthikscale3 Jun 6, 2024
b7429f7
Minor fix
karthikscale3 Jun 6, 2024
6441328
Manual evaluations UX changes
karthikscale3 Jun 6, 2024
f131bb8
Chart title fix
karthikscale3 Jun 6, 2024
8f44d50
Merge branch 'development' of github.com:Scale3-Labs/langtrace into k…
karthikscale3 Jun 6, 2024
c6bf50b
Disable experiments
karthikscale3 Jun 6, 2024
3f9689f
fix
karthikscale3 Jun 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Evaluation changes
  • Loading branch information
karthikscale3 committed Jun 3, 2024
commit c2d2882f3aeb07f1ae7aadba485e4d99d7cab076
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ import { toast } from "sonner";
export default function Page() {
const router = useRouter();
const projectId = useParams()?.project_id as string;
const testId = useParams()?.test_id as string;
const page = parseInt(useSearchParams()?.get("page") || "1");
// const spanId = useSearchParams()?.get("span_id");

Expand Down
133 changes: 25 additions & 108 deletions app/(protected)/project/[project_id]/evaluate/page-client.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,8 @@ import { AddtoDataset } from "@/components/shared/add-to-dataset";
import { Button } from "@/components/ui/button";
import { Separator } from "@/components/ui/separator";
import { Skeleton } from "@/components/ui/skeleton";
import { cn, getChartColor } from "@/lib/utils";
import { cn } from "@/lib/utils";
import { Test } from "@prisma/client";
import { ProgressCircle } from "@tremor/react";
import { ChevronsRight, RabbitIcon } from "lucide-react";
import Link from "next/link";
import { useParams } from "next/navigation";
Expand Down Expand Up @@ -104,9 +103,9 @@ export default function PageClient({ email }: { email: string }) {
?.average || 0;

return (
<div className="w-full flex flex-col">
<div className="w-full flex flex-col gap-4">
<div className="md:px-24 px-12 py-12 flex justify-between bg-muted">
<h1 className="text-3xl font-semibold">Manual Evaluations</h1>
<h1 className="text-3xl font-semibold">Evaluations</h1>
<div className="flex gap-2">
{selectedTest && (
<Link
Expand All @@ -129,113 +128,31 @@ export default function PageClient({ email }: { email: string }) {
{testAveragesLoading || testsLoading || !tests ? (
<PageSkeleton />
) : tests?.tests?.length > 0 ? (
<div className="flex flex-row gap-4 absolute top-[14rem] w-full md:px-24 px-12">
<div className="bg-primary-foreground flex flex-col gap-0 border rounded-md w-[12rem] h-fit">
{tests?.tests?.map((test: Test, i: number) => {
const average =
testAverages?.averages?.find(
(avg: any) => avg.testId === test?.id
)?.average || 0;
return (
<div className="flex flex-col" key={i}>
<div
onClick={() => {
setSelectedTest(test);
setCurrentData([]);
setPage(1);
setTotalPages(1);
}}
className={cn(
"flex flex-col gap-4 p-4 items-start cursor-pointer",
i === 0 ? "rounded-t-md" : "",
i === tests?.tests?.length - 1 ? "rounded-b-md" : "",
selectedTest?.id === test.id
? "dark:bg-black bg-white border-l-2 border-primary"
: ""
)}
>
<p
className={cn(
"text-sm text-muted-foreground font-semibold capitalize",
selectedTest?.id === test.id ? "text-primary" : ""
)}
>
{test.name}
</p>
<ProgressCircle
color={getChartColor(average)}
value={average}
size="sm"
>
<span className="text-[0.6rem] text-primary font-bold">
{Math.round(average)}%
</span>
</ProgressCircle>
</div>
<Separator />
</div>
);
})}
</div>
<div className="bg-primary-foreground flex flex-col gap-12 border rounded-md w-full p-4 mb-24">
<div className="flex flex-row gap-2">
<div className="flex flex-col gap-4 items-start w-[25rem]">
<div className="flex flex-col gap-1">
<h1 className="text-xl font-semibold capitalize break-normal">
{selectedTest?.name} Evaluation
</h1>
<span className="text-xs font-semibold text-muted-foreground">
Test ID: {selectedTest?.id}
</span>
</div>
<div className="flex flex-col gap-1">
<span className="text-xs text-muted-foreground font-semibold">
Evaluation Scale
</span>
<span className="text-sm text-primary">
{selectedTest?.min} to {selectedTest?.max} in steps of +
{selectedTest?.step}
</span>
</div>
<ProgressCircle
color={getChartColor(testAverage)}
value={testAverage}
size="md"
>
<span className="text-sm text-primary font-bold">
{Math.round(testAverage)}%
</span>
</ProgressCircle>
<p className="text-sm text-muted-foreground">
{selectedTest?.description}
</p>
</div>
{selectedTest && (
<EvalChart projectId={projectId} test={selectedTest} />
)}
</div>
<div className="flex flex-col gap-2">
<AddtoDataset
<div className="flex flex-col gap-12 top-[16rem] w-full md:px-24 px-12 mb-24">
{selectedTest && (
<EvalChart projectId={projectId} test={selectedTest} />
)}
<div className="flex flex-col gap-2">
<AddtoDataset
projectId={projectId}
selectedData={selectedData}
className="w-fit self-end"
/>

{selectedTest && (
<EvaluationTable
tests={tests.tests}
projectId={projectId}
selectedData={selectedData}
className="w-fit self-end"
setSelectedData={setSelectedData}
currentData={currentData}
setCurrentData={setCurrentData}
page={page}
setPage={setPage}
totalPages={totalPages}
setTotalPages={setTotalPages}
/>

{selectedTest && (
<EvaluationTable
projectId={projectId}
test={selectedTest}
selectedData={selectedData}
setSelectedData={setSelectedData}
currentData={currentData}
setCurrentData={setCurrentData}
page={page}
setPage={setPage}
totalPages={totalPages}
setTotalPages={setTotalPages}
/>
)}
</div>
)}
</div>
</div>
) : (
Expand Down
23 changes: 14 additions & 9 deletions app/api/evaluation/route.ts
Original file line number Diff line number Diff line change
Expand Up @@ -32,16 +32,21 @@ export async function POST(req: NextRequest) {
);
}

const payload: any = {
spanId,
traceId,
projectId,
userId,
userScore,
reason: reason || "",
};

if (dataId) {
payload["dataId"] = dataId;
}

const evaluation = await prisma.evaluation.create({
data: {
spanId,
traceId,
projectId,
userId,
userScore,
reason: reason || "",
dataId,
},
data: payload,
});
return NextResponse.json({
data: evaluation,
Expand Down
102 changes: 43 additions & 59 deletions components/evaluate/evaluation-row.tsx
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
import { HoverCell } from "@/components/shared/hover-cell";
import { LLMView } from "@/components/shared/llm-view";
import { Button } from "@/components/ui/button";
import { Checkbox } from "@/components/ui/checkbox";
import detectPII from "@/lib/pii";
import { correctTimestampFormat } from "@/lib/trace_utils";
import { calculatePriceFromUsage, cn, formatDateTime } from "@/lib/utils";
import { calculatePriceFromUsage, formatDateTime } from "@/lib/utils";
import { Evaluation } from "@prisma/client";
import {
ArrowTopRightIcon,
CheckCircledIcon,
CrossCircledIcon,
DotFilledIcon,
} from "@radix-ui/react-icons";
import { ChevronDown, ChevronRight } from "lucide-react";
import Link from "next/link";
import { useState } from "react";
import { useQuery } from "react-query";
Expand All @@ -27,30 +25,25 @@ export default function EvaluationRow({
key,
span,
projectId,
testId,
page,
onCheckedChange,
selectedData,
}: {
key: number;
span: any;
projectId: string;
testId: string;
page: number;
onCheckedChange: (data: CheckedData, checked: boolean) => void;
selectedData: CheckedData[];
}) {
const [score, setScore] = useState(-100); // 0: neutral, 1: thumbs up, -1: thumbs down
const [collapsed, setCollapsed] = useState(true);
const [evaluation, setEvaluation] = useState<Evaluation>();
const [addedToDataset, setAddedToDataset] = useState(false);

useQuery({
queryKey: ["fetch-evaluation-query", span.span_id, testId],
queryKey: ["fetch-evaluation-query", span.span_id],
queryFn: async () => {
const response = await fetch(
`/api/evaluation?spanId=${span.span_id}&testId=${testId}`
);
const response = await fetch(`/api/evaluation?spanId=${span.span_id}`);
const result = await response.json();
setEvaluation(result.evaluations.length > 0 ? result.evaluations[0] : {});
setScore(
Expand Down Expand Up @@ -114,16 +107,10 @@ export default function EvaluationRow({
).length > 0);

return (
<div className="flex flex-col gap-3 w-full" key={key}>
<div
className={cn(
!collapsed ? "border-[1px] border-muted-foreground" : "",
"grid grid-cols-15 items-center gap-3 py-3 px-4 w-full cursor-pointer"
)}
onClick={() => setCollapsed(!collapsed)}
>
<tr key={key}>
<td>
<div
className="flex flex-row items-center gap-2 col-span-2"
className="flex flex-row items-center gap-2"
onClick={(e) => e.stopPropagation()}
>
<Checkbox
Expand Down Expand Up @@ -154,39 +141,25 @@ export default function EvaluationRow({
}}
checked={selectedData.some((d) => d.spanId === span.span_id)}
/>
<Button
variant={"ghost"}
size={"icon"}
onClick={() => setCollapsed(!collapsed)}
>
{collapsed && (
<ChevronRight className="text-muted-foreground w-5 h-5" />
)}
{!collapsed && (
<ChevronDown className="text-muted-foreground w-5 h-5" />
)}
</Button>
<p
className="text-xs text-muted-foreground font-semibold"
onClick={() => setCollapsed(!collapsed)}
>
<p className="text-xs text-muted-foreground font-semibold">
{formatDateTime(correctTimestampFormat(span.start_time))}
</p>
</div>
<p className="text-xs font-medium">{model}</p>
</td>
<td className="text-xs font-medium">{model}</td>
<td>
<HoverCell
className="flex items-center text-xs h-10 truncate overflow-y-scroll font-semibold col-span-2"
className="h-10 w-48 overflow-hidden truncate text-xs font-semibold"
values={prompts?.length > 0 ? JSON.parse(prompts) : []}
/>
</td>
<td>
<HoverCell
className="flex items-center text-xs h-10 truncate overflow-y-scroll font-semibold col-span-2"
className="w-48 overflow-hidden truncate h-10 text-xs font-semibold"
values={responses?.length > 0 ? JSON.parse(responses) : []}
/>
<p className="text-xs font-semibold">
{cost.total.toFixed(6) !== "0.000000"
? `\$${cost.total.toFixed(6)}`
: ""}
</p>
</td>
<td>
<div className="flex flex-row gap-0 items-center font-semibold">
{piiDetected ? (
<DotFilledIcon className="text-red-600 w-6 h-6" />
Expand All @@ -195,23 +168,34 @@ export default function EvaluationRow({
)}
<p className="text-xs">{piiDetected ? "Yes" : "No"}</p>
</div>
<p className="text-xs text-muted-foreground font-semibold">
{durationMs}ms
</p>
<p className="text-sm font-semibold">
{score !== -100 ? score : "Not evaluated"}
</p>
<p className="text-sm font-semibold">
{userScore ? userScore : "Not evaluated"}
</p>
<p className="text-sm font-semibold">{userId || "Not Available"}</p>
<div className=" col-span-2 flex flex-row items-center justify-evenly">
</td>
<td className="text-xs font-semibold text-center">
{score !== -100 ? score : "Not evaluated"}
</td>
<td className="text-xs font-semibold text-center">
{score !== -100 ? score : "Not evaluated"}
</td>
<td className="text-xs font-semibold text-center">
{score !== -100 ? score : "Not evaluated"}
</td>
<td className="text-xs font-semibold text-center">
{score !== -100 ? score : "Not evaluated"}
</td>
<td className="text-xs font-semibold text-center">
{score !== -100 ? score : "Not evaluated"}
</td>
<td className="text-xs font-semibold text-center">
{userScore ? userScore : "Not evaluated"}
</td>
<td className="text-xs font-semibold">{userId || "Not available"}</td>
<td>
<div className="flex flex-row items-center justify-evenly">
{addedToDataset ? (
<CheckCircledIcon className="text-green-600 w-5 h-5" />
) : (
<CrossCircledIcon className="text-muted-foreground w-5 h-5" />
)}
<Link href={`/project/${projectId}/evaluate/${testId}?page=${page}`}>
<Link href={`/project/${projectId}/evaluate?page=${page}`}>
<Button
onClick={(e) => e.stopPropagation()}
variant={"secondary"}
Expand All @@ -221,14 +205,14 @@ export default function EvaluationRow({
</Button>
</Link>
</div>
</div>
{!collapsed && (
</td>
{/* {!collapsed && (
<LLMView
responses={[responses]}
prompts={[prompts]}
doPiiDetection={true}
/>
)}
</div>
)} */}
</tr>
);
}
Loading