Skip to content

Support describing license properties and SPDX expression assertions #1577

Closed
@spiffcs

Description

Syft License Revamp

Syft currently represents license as different datatypes depending on the section of the schema it appears at:

AlpmMetadata: string (required)
ApkMetadata: string (required)
GemMetadata: []string (optional)
NpmPackageJSONMetadata: []string (required)
Package: []string (required)
PhpComposerJSONMetadata: []string (optional): 
PythonPackageMetadata: string (required)
RpmMetadata: string (required)

Specifically, the package []string construct has proven to be a bit limited in how the data can be represented to a user interested in license compliance. Many packages now use SPDX LICENSE ID to communicate FOSS license information. These identifier are currently incompatible with how we represent license given the complex nature of some of the constructs. Example:

// SPDX-License-Identifier: Apache-2.0 AND (MIT OR GPL-2.0-only)

NOTE FROM COMMUNITY MEET:

  • String will not be deprecated, but possibly the AST (abstract syntax tree) will be the preferred representation

The above shows a case where the consumer of the software can choose to use Apache-2.0 and one of the following: MIT, OR GPL-2.0-only.

The file is subject to both the Apache-2.0 license, and at the licensee’s choice either the MIT license or version 2.0 only of the GPL.
The licensee may choose between MIT and GPL-2.0.
Whichever they choose, they must comply with both that license and Apache-2.0.

Furthermore, syft's current licenses format is limited in representing the distinction between DECLARED vs CONCLUDED

The SPDX format gives implementers the choice in determining if a license should be in the concluded license field or the declared license field:

Concluded

TODO: Update this description based on feedback from community meeting
Contain the license the SPDX document creator has concluded as governing the package or alternative values, if the governing license cannot be determined.

If the Concluded License is not the same as the Declared License (7.15), a written explanation should be provided in the Comments on License field (7.16). With respect to NOASSERTION, a written explanation in the Comments on License field (7.16) is preferred. If the Concluded License field is not present in a package, it implies an equivalent meaning to NOASSERTION.

Declared

List the licenses that have been declared by the authors of the package. Any license information that does not originate from the package authors, e.g. license information from a third-party repository, should not be included in this field.

Syft's approach

Syft should enhance the license representation from []string to []License in order to convey the above information more clearly. The following struct will be added in favor of string to give downstream tooling more options in accurately reading how the license was determined at syft's run:

type Licenses struct {
    SPDXExpression string // expression used to derive the below licenses
    Licenses: []License // licenses and their give metadata
}

type License struct {
    Name string
    Location Location
    Concluded bool // if false then we can assume decalred? NOTE: Update this from meeting notes about when we should declared concluded
    Confidence int
    Offset int
    Extent int
}

type Location struct {
    Path string
    LayerID string
}

Here is a sample of the json representation of the above:

{
  "spdxLicenseExpression":"mit AND (LGPL-2.1-or-later OR BSD-3-Clause)",
  "licenses":[
    {
      "Name":"LGPL",
      "location":{
        "path":"/lib/apk/db/installed",
        "layerID":"sha256:ded7a220bb058e28ee3254fbba04ca90b679070424424761a53a043b93b612bf"
      },
      "concluded":true, 
      "confidence":0.92,
      "Offset":0,
      "Extent":23829
    }
  ]
}

In the event a license is successfully concluded the above uses google license classifier to accurately assess the license packaged with the software. If provides the confidence level (how close a match was given the locations contents compared to some source DB), the ofset (how far into the file the match was found), and the extent (how long the match was).

Why is this needed:
This enhancement is needed so syft can better represent SPDX license expression intentions, illustrate more data on where the license, if concluded, was found, and give downstream tools looking to use SBOM for license compliance more tooling/accuracy in assessing the license contents against policy they create.

Metadata

Assignees

Labels

enhancementNew feature or requestlicenserelating to software licensing

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions