[Task]: Spotting commands in the stream from coding assistants like cline

### Description

We have done some work to spot suspicious commands in https://github.com/stacklok/codegate/issues/34. The task here is to write this code into codegate. This involves
* Creating the model from the code in https://github.com/stacklok/research/blob/command-detection/command_detection/command_models.ipynb. The should result in a function which returns good or bad when fed a command.
* In a platform neutral way (cline, copilot edits, etc) spot when a command is returned and categorise it and log this if the command is bad.

Extensions for the future
* Have more than two categories - e.g. safe, risky, and block
* Block commands in the 'block' category
* Have the block behaviour configurable
* Have more options around context - e.g. files and dirs that are writable
* Have the NN learn from feedback from the user (i.e. retrain the NN from feedback in the codegate UI)


We will probably have to intercept the commands at
```
snippets = extract_snippets(current_content)
```
and write the comment back at
```
async def _snippet_comment(self, snippet: CodeSnippet, context: PipelineContext) -> str:
```
As a baseline we decided to use the `hybrid-all-MiniLM-L6-v2` with post-processing by a small ANN. We didn't want the extra cost of codebert, but the local ANN seems to produce some benefit.

### Additional Context
We need to decide which model to use for the embeddings. all-minilm-L6-v2 works well, especially with a post ANN process step. It is already in codegate, so we get it for free. microsoft/codebert-base works better as expected, but at a cost of 476 MB.
The ANNs are much smaller
ls -lh | grep hybrid
-rw-r--r-- 1 nigel staff  228K 29 Jan 18:21 hybrid-all-MiniLM-L6-v2.model
-rw-r--r-- 1 nigel staff  420K 29 Jan 18:21 hybrid-microsoft-codebert-base.model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Task]: Spotting commands in the stream from coding assistants like cline #844

Description

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Task]: Spotting commands in the stream from coding assistants like cline #844

Description

Description

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions