Skip to content

Latest commit

 

History

History
39 lines (30 loc) · 1.48 KB

todo.md

File metadata and controls

39 lines (30 loc) · 1.48 KB

TODO

  • Write a Solver function that solves the problem

  • A solver is all the scaffolding to evaluate, i.e orchestration + prompts + results etc.

  • This would be useful when writing the full agent too.

  • Fix the input and output format of the solver

  • Use OpenAI eval fw to evaluate the solver

    • Metrics need to be defined
    • Initially just accuracy is the focus

Evals dataset orchestration

  • Write eval processor that can take the view json as input
    • convert eval item to a scan specific item
    • run it for multiple eval items
  • Write a code scanner that can take a scan item as input and perform llm scan using it
  • Add OpenAI and Anthropic capability
  • Add batching
  • Add metric collection capability from eval run
    • Token metrics
    • Accuracy metrics
    • Script to generate a report of a eval
  • Get CWE categories that are unique and add labels accordingly

Processing pipelines

  • Create a process pipeline for each eval type

    • Simple query
    • Batching and code items segregation
    • tagged + query
    • Categorization
      • Create broad functional areas and associated cwe issues to use in categorization
      • AI call: Categorize code into these "possible" vulnerability buckets using a llm. Pure categorization task.
    • Prompt should work in chain of thought manner to detect, verify and score issues
  • Run eval for multiple eval sets