Skip to content

User Customizable Pipelines #769

Open
@lukehinds

Description

@lukehinds

Currently, our pipeline runs a predefined sequence of steps that are set within code

def create_input_pipeline(self) -> SequentialPipelineProcessor:
input_steps: List[PipelineStep] = [
# make sure that this step is always first in the pipeline
# the other steps might send the request to a LLM for it to be analyzed
# and without obfuscating the secrets, we'd leak the secrets during those
# later steps
CodegateSecrets(),
CodegateCli(),
CodeSnippetExtractor(),
CodegateContextRetriever(),
SystemPrompt(Config.get_config().prompts.default_chat),
]
return SequentialPipelineProcessor(input_steps, self.secrets_manager, is_fim=False)

While the default pipeline configuration serves most users well, some may have unique requirements that necessitate adjustments in the sequence or nature of steps performed. For instance, a user working with a local LLM may not care so much for secrets leakage, but may care a lot about malicious packages.

To address this need for adaptability, I propose implementing customizable pipelines that allow users to:

  1. Reorder pipeline stages: Users could adjust the sequence of steps based on their project's risk profile or coding assistant preferences.
  2. Enable/disable stages: Users could choose whether to include certain steps in their workflow, provided they understand the risks involved and are given clear warnings about disabling crucial steps (see below).
  3. Customize stage configurations: Within each pipeline stage, users should be able to input custom entries or exceptions tailored to their needs. For example, they might want to add specific strings or patterns that need to be redacted to secret redaction on the fly.

Implementation Details

To implement this feature, we would need perhaps a pipelines endpoint to allow adjusts such as:

  • Retrieve the Current Pipeline Configuration: Fetch the list of steps, their order, and whether they are enabled/disabled.
  • Update the Pipeline Configuration: Allow reordering of steps, enabling/disabling specific steps, and saving the updated configuration.
  • Add/Remove Steps: Manage dynamic enabling/disabling of steps.

To spitball, something like the following:

HTTP Method Endpoint Description
GET /pipeline/steps Retrieve the current pipeline configuration
PUT /pipeline/steps/{id} Update a specific step (e.g., reorder, enable/disable)
PATCH /pipeline/reorder Reorder multiple steps in one request

{
  "id": "codegate-secrets",
  "name": "CodeGate Secrets Pipeline Step",
  "order": 1,
  "enabled": true,
  "custom_signatures": []
}

Some other key considerations

  • Data Validation: Use a schema validator like Pydantic for inputs.
  • Atomic Operations: When reordering or updating steps, ensure no partial updates occur.
  • Database Backing: Replace in-memory storage of signatures with a persistent database (e.g., sqlitelite codegate.db)
  • Versioning: Consider versioning the API for future enhancements (/v1/pipeline/steps).

UI and Design

A user-friendly interface for pipeline management, allowing users to drag and drop pipeline steps to adjust their order of execution or inclusion / exclusion. Each stage could have an associated configuration panel where users can input custom rules (patterns to match for redaction) or exceptions tailored to their needs.

Regarding the handling of conflicts between default and custom rules, CodeGate should provide clear warnings when users attempt to disable crucial steps. For instance:

Warning: Disabling the 'Hardcoded Secrets Detection' step may expose your code to risks such as unauthorized access or data breaches. Are you sure you want to proceed?

However, despite providing such warnings, it's essential not to lock users out of making decisions about their workflows entirely. Some users might have valid reasons for disabling certain steps, perhaps due to specific project requirements or known false positives.

Workspace specific

Pipelines should be configurable for each workspace. There can be a global default adopted for each workspace, and then when customized , its saved as unique. We could also once saving a pipeline, make it optionally available to all other projects. This level of implementation should come much later though. An initial prototype of one pipeline for all projects may suffice.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions