Skip to content

Performance issues from scanSBOMFile reading every file multiple times #257

Closed

Description

Scorecard utilizes osvscanner.DoScan when performing its Vulnerabilities check. The time to complete the check is more than an order of magnitude higher than other checks. Running pprof shows a hot spot in scanSBOMFile.

It looks like when walking a directory, every file is potentially parsed as an SBOM:

// No need to check for error
// If scan fails, it means it isn't a valid SBOM file,
// so just move onto the next file
_ = scanSBOMFile(r, query, path)

There are currently two providers, SPDX and CycloneDX. While SPDX checks for a filename, there's no such check for CycloneDX:

for _, provider := range sbom.Providers {
if provider.Name() == "SPDX" &&
!strings.Contains(strings.ToLower(filepath.Base(path)), ".spdx") {
// All spdx files should have the .spdx in the filename, even if
// it's not the extension: https://spdx.github.io/spdx-spec/v2.3/conformance/
// Skip if this isn't the case to avoid panics
continue
}

I believe this means attempting to parse every file as a CycloneDX SBOM, twice. In large repositories, this adds up ( longest observed is 5 minutes) :

func (c *CycloneDX) GetPackages(r io.ReadSeeker, callback func(Identifier) error) error {
var bom cyclonedx.BOM
for _, formatType := range cycloneDXTypes {
_, err := r.Seek(0, io.SeekStart)
if err != nil {
return fmt.Errorf("failed to seek to start of file: %w", err)
}
decoder := cyclonedx.NewBOMDecoder(r, formatType)
err = decoder.Decode(&bom)

8sg3cyNvvMYG5HH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions