TODO

~~Write a Solver function that solves the problem~~
~~A solver is all the scaffolding to evaluate, i.e orchestration + prompts + results etc.~~
~~This would be useful when writing the full agent too.~~
~~Fix the input and output format of the solver~~
~~Use OpenAI eval fw to evaluate the solver~~
- ~~Metrics need to be defined~~
- ~~Initially just accuracy is the focus~~

Evals dataset orchestration

Write eval processor that can take the view json as input
- convert eval item to a scan specific item
- run it for multiple eval items
Write a code scanner that can take a scan item as input and perform llm scan using it
Add OpenAI and Anthropic capability
Add batching
Add metric collection capability from eval run
- Token metrics
- ~~Accuracy metrics~~
- ~~Script to generate a report of a eval~~
Get CWE categories that are unique and add labels accordingly