Skip to content
This repository was archived by the owner on Jun 5, 2025. It is now read-only.
This repository was archived by the owner on Jun 5, 2025. It is now read-only.

[Task]: Spotting commands in the stream from coding assistants like cline #844

@therealnb

Description

@therealnb

Description

We have done some work to spot suspicious commands in #34. The task here is to write this code into codegate. This involves

Extensions for the future

  • Have more than two categories - e.g. safe, risky, and block
  • Block commands in the 'block' category
  • Have the block behaviour configurable
  • Have more options around context - e.g. files and dirs that are writable
  • Have the NN learn from feedback from the user (i.e. retrain the NN from feedback in the codegate UI)

We will probably have to intercept the commands at

snippets = extract_snippets(current_content)

and write the comment back at

async def _snippet_comment(self, snippet: CodeSnippet, context: PipelineContext) -> str:

As a baseline we decided to use the hybrid-all-MiniLM-L6-v2 with post-processing by a small ANN. We didn't want the extra cost of codebert, but the local ANN seems to produce some benefit.

Additional Context

We need to decide which model to use for the embeddings. all-minilm-L6-v2 works well, especially with a post ANN process step. It is already in codegate, so we get it for free. microsoft/codebert-base works better as expected, but at a cost of 476 MB.
The ANNs are much smaller
ls -lh | grep hybrid
-rw-r--r-- 1 nigel staff 228K 29 Jan 18:21 hybrid-all-MiniLM-L6-v2.model
-rw-r--r-- 1 nigel staff 420K 29 Jan 18:21 hybrid-microsoft-codebert-base.model

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions