title | sidebar_position | description | keywords | ||||||
---|---|---|---|---|---|---|---|---|---|
Codebase scans with Semgrep |
30 |
Scan a codebase using Semgrep |
|
This tutorial shows you how to scan your codebases using Semgrep, a popular code-scanning tool. Semgrep supports a wide variety of languages and includes a free version for individuals who want to scan files locally.
In this tutorial, you'll set up a simple ingestion-only workflow with two steps. The first step runs the scan; the second step ingests the results.
:::important important notes
-
This tutorial uses the free version of Semgrep to run simple SAST scans. More advanced workflows are possible but are outside the scope of this tutorial.
-
Semgrep scans use an agent that uploads data to the Semgrep cloud by default. Semgrep uses this data to improve the user experience. Therefore this tutorial is not suitable for air-gapped environments.
-
This tutorial has the following prerequisites:
- A Harness account and STO module license.
- A basic understanding of key STO concepts and good practices. Here are some good resources:
- A code repo connector and an access token to your Git provider account.
- A Semgrep account login and access token. For specific instructions, go to Getting started from the CLI in the README on GitHub.
- Your Git and Semgrep access tokens must be stored as Harness secrets
:::
To do this tutorial, you need a codebase connector to your Git repository and an access token. A connector can specify a Git account (https://github.com/my-account) or a specific repository (https://github.com/my-account).
This tutorial uses the dvpwa repository as an example. The simplest setup is to fork this repository into your Git account and scan the fork. However, you can run your scans on any codebase that uses a language supported by Semgrep.
Do the following:
-
Select Security Testing Orchestration (left menu, top) > Pipelines > Create a Pipeline. Enter a name and click Start.
-
In the new pipeline, select Add stage > Security Tests.
-
Set up your stage as follows:
-
Enter a Stage Name.
-
In Select Git Provider, select the connector to your Git provider account.
-
In Repository Name, click the value type select (tack button) and select Runtime Input.
-
-
Go to Infrastructure and select Cloud, Linux, and AMD64 or ARM64 for the infrastructure, OS, and architecture.
You can also use a Kubernetes or Docker infrastructure, but these require additional work to set up.
Now you will add a step that runs a scan using the local Semgrep container image maintained by Harness.
-
Go to Execution and add a Run step.
-
Configure the step as follows:
-
Name = run_semgrep_scan
-
Command =
semgrep /harness --sarif --config auto -o /harness/results.sarif
This command runs a Semgrep scan on your code repo and outputs the results to a SARIF file.
-
Open Optional Configuration and set the following options:
-
Container Registry — When prompted, select Account and then Harness Docker Connector.
-
Image = returntocorp/semgrep
-
Add the following environment variable:
-
Limit Memory = 4096Mi (Kubernetes or Docker infrastructures only)
-
-
Now that you've added a step to run the scan, it's a simple matter to ingest it into your pipeline. Harness provides a set of customized steps for popular scanners such as Semgrep.
-
In Execution, add a Semgrep step after your Run step.
-
Configure the step as follows:
-
Name =
ingest_semgrep_data
-
Type = Repository
-
Under Target:
-
Name = Select Runtime Input as the value type.
-
Variant = Select Runtime Input as the value type.
-
-
Ingestion File =
/harness/results.sarif
-
Fail on Severity = Critical
-
-
In the Pipeline Studio, select Run (top right).
-
When prompted, enter your runtime inputs.
-
Under Codebase, enter the repository and branch to scan.
-
Under Stage: <stage_name>, enter the [target name] and [variant] you want to use. In most cases, you want to use the repository for the target and the branch for the variant.
If you're scanning the codebase for the first time, enter the root branch of your repo. This is usually the
main
ormaster
branch.If you're scanning the example repository mentioned above, enter
dvpwa
for the repository and target, andmaster
for the branch and variant. -
-
Run the pipeline and then wait for the execution to finish.
If you used the example repository mentioned above, you'll see that the pipeline failed for an entirely expected reason: the Semgrep step is configured to fail the pipeline if the scan detected any critical vulnerabilities. The final log entry for the Semgrep step reads:
Exited with message: fail_on_severity is set to critical and that threshold was reached.
-
Select Security Tests and examine any issues detected by your scan.
:::tip
It is good practice to specify a baseline for every target. Defining a baseline makes it easy for developers to drill down into “shift-left” issues in downstream variants and security personnel to drill down into “shift-right” issues in the baseline.
:::
-
Select Test Targets (left menu).
-
Select the baseline you want for your target.
pipeline:
name: semgrep-simple-scan
identifier: semgrepsimplescan
projectIdentifier: MY_PROJECT
orgIdentifier: MY_HARNESS_ORG
tags: {}
stages:
- stage:
name: semgrep_tutorial_test_stage
identifier: semgrep_tutorial_test_stage
description: ""
type: SecurityTests
spec:
cloneCodebase: true
platform:
os: Linux
arch: Arm64
runtime:
type: Cloud
spec: {}
execution:
steps:
- step:
type: Run
name: run_semgrep_scan
identifier: run_semgrep_scan
spec:
connectorRef: account.harnessImage
image: returntocorp/semgrep
shell: Sh
command: semgrep /harness --sarif --config auto -o /harness/results.sarif
envVariables:
SEMGREP_APP_TOKEN: <+secrets.getValue("MY_SEMGREP_KEY")>
- step:
type: Semgrep
name: ingest_semgrep_data
identifier: ingest_semgrep_data
spec:
mode: ingestion
config: default
target:
name: <+input>
type: repository
variant: <+input>
advanced:
log:
level: info
fail_on_severity: critical
ingestion:
file: /harness/results.sarif
properties:
ci:
codebase:
connectorRef: MY_GIT_CONNECTOR
repoName: <+input>
build: <+input>