title

sidebar_position

description

keywords

Codebase scans with Semgrep

30

Scan a codebase using Semgrep

STO

security

SAST

security

codebase

Semgrep

This tutorial shows you how to scan your codebases using Semgrep, a popular code-scanning tool. Semgrep supports a wide variety of languages and includes a free version for individuals who want to scan files locally.

In this tutorial, you'll set up a simple ingestion-only workflow with two steps. The first step runs the scan; the second step ingests the results.

:::important important notes

This tutorial uses the free version of Semgrep to run simple SAST scans. More advanced workflows are possible but are outside the scope of this tutorial.
Semgrep scans use an agent that uploads data to the Semgrep cloud by default. Semgrep uses this data to improve the user experience. Therefore this tutorial is not suitable for air-gapped environments.
This tutorial has the following prerequisites:
- A Harness account and STO module license.
- A basic understanding of key STO concepts and good practices. Here are some good resources:
  - Your first STO pipeline
  - Key Concepts in STO
- A code repo connector and an access token to your Git provider account.
- A Semgrep account login and access token. For specific instructions, go to Getting started from the CLI in the README on GitHub.
- Your Git and Semgrep access tokens must be stored as Harness secrets

:::

Set up your codebase

To do this tutorial, you need a codebase connector to your Git repository and an access token. A connector can specify a Git account (https://github.com/my-account) or a specific repository (https://github.com/my-account).

This tutorial uses the dvpwa repository as an example. The simplest setup is to fork this repository into your Git account and scan the fork. However, you can run your scans on any codebase that uses a language supported by Semgrep.

Set up your pipeline

Do the following:

Select Security Testing Orchestration (left menu, top) > Pipelines > Create a Pipeline. Enter a name and click Start.
In the new pipeline, select Add stage > Security Tests.
Set up your stage as follows:
1. Enter a Stage Name.
2. In Select Git Provider, select the connector to your Git provider account.
3. In Repository Name, click the value type select (tack button) and select Runtime Input.
Go to Infrastructure and select Cloud, Linux, and AMD64 or ARM64 for the infrastructure, OS, and architecture.

You can also use a Kubernetes or Docker infrastructure, but these require additional work to set up.

Add the scan step

Now you will add a step that runs a scan using the local Semgrep container image maintained by Harness.

Go to Execution and add a Run step.
Configure the step as follows:
1. Name = run_semgrep_scan
2. Command = semgrep /harness --sarif --config auto -o /harness/results.sarif
  
  This command runs a Semgrep scan on your code repo and outputs the results to a SARIF file.
3. Open Optional Configuration and set the following options:
  1. Container Registry — When prompted, select Account and then Harness Docker Connector.
  2. Image = returntocorp/semgrep
  3. Add the following environment variable:
    - Key : SEMGREP_APP_TOKEN
    - Value : Click the type selector (right), set the value type to Expression, and enter the value <+secrets.getValue("YOUR_SEMGREP_TOKEN_SECRET")>.
  4. Limit Memory = 4096Mi (Kubernetes or Docker infrastructures only)

Add the Semgrep (ingest) step

Now that you've added a step to run the scan, it's a simple matter to ingest it into your pipeline. Harness provides a set of customized steps for popular scanners such as Semgrep.

In Execution, add a Semgrep step after your Run step.
Configure the step as follows:
1. Name = ingest_semgrep_data
2. Type = Repository
3. Under Target:
  1. Name = Select Runtime Input as the value type.
  2. Variant = Select Runtime Input as the value type.
4. Ingestion File = /harness/results.sarif
5. Fail on Severity = Critical

Run the pipeline and check your results

In the Pipeline Studio, select Run (top right).
When prompted, enter your runtime inputs.
- Under Codebase, enter the repository and branch to scan.
- Under Stage: <stage_name>, enter the [target name] and [variant] you want to use. In most cases, you want to use the repository for the target and the branch for the variant.
If you're scanning the codebase for the first time, enter the root branch of your repo. This is usually the main or master branch.

If you're scanning the example repository mentioned above, enter dvpwa for the repository and target, and master for the branch and variant.
Run the pipeline and then wait for the execution to finish.

If you used the example repository mentioned above, you'll see that the pipeline failed for an entirely expected reason: the Semgrep step is configured to fail the pipeline if the scan detected any critical vulnerabilities. The final log entry for the Semgrep step reads: Exited with message: fail_on_severity is set to critical and that threshold was reached.
Select Security Tests and examine any issues detected by your scan.

Specify the baseline

:::tip

It is good practice to specify a baseline for every target. Defining a baseline makes it easy for developers to drill down into “shift-left” issues in downstream variants and security personnel to drill down into “shift-right” issues in the baseline.

:::

Select Test Targets (left menu).
Select the baseline you want for your target.

YAML pipeline example

pipeline:
  name: semgrep-simple-scan
  identifier: semgrepsimplescan
  projectIdentifier: MY_PROJECT
  orgIdentifier: MY_HARNESS_ORG
  tags: {}
  stages:
    - stage:
        name: semgrep_tutorial_test_stage
        identifier: semgrep_tutorial_test_stage
        description: ""
        type: SecurityTests
        spec:
          cloneCodebase: true
          platform:
            os: Linux
            arch: Arm64
          runtime:
            type: Cloud
            spec: {}
          execution:
            steps:
              - step:
                  type: Run
                  name: run_semgrep_scan
                  identifier: run_semgrep_scan
                  spec:
                    connectorRef: account.harnessImage
                    image: returntocorp/semgrep
                    shell: Sh
                    command: semgrep /harness --sarif --config auto -o /harness/results.sarif
                    envVariables:
                      SEMGREP_APP_TOKEN: <+secrets.getValue("MY_SEMGREP_KEY")>
              - step:
                  type: Semgrep
                  name: ingest_semgrep_data
                  identifier: ingest_semgrep_data
                  spec:
                    mode: ingestion
                    config: default
                    target:
                      name: <+input>
                      type: repository
                      variant: <+input>
                    advanced:
                      log:
                        level: info
                      fail_on_severity: critical
                    ingestion:
                      file: /harness/results.sarif
  properties:
    ci:
      codebase:
        connectorRef: MY_GIT_CONNECTOR
        repoName: <+input>
        build: <+input>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sast-scan-semgrep.md

sast-scan-semgrep.md

Set up your codebase

Set up your pipeline

Add the scan step

Add the Semgrep (ingest) step

Run the pipeline and check your results

Specify the baseline

YAML pipeline example

Files

sast-scan-semgrep.md

Latest commit

History

sast-scan-semgrep.md

File metadata and controls

Set up your codebase

Set up your pipeline

Add the scan step

Add the Semgrep (ingest) step

Run the pipeline and check your results

Specify the baseline

YAML pipeline example