-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(license): stop spliting a long license text #7336
Conversation
@knqyf263 i think it's ready for review. |
Is there a way to distinguish between the license name and the license text? |
Right now, I'm not sure. |
I want to show "unknown" for the license text. |
My concern is next. cat /usr/share/doc/python3.9-minimal/copyright but if I understand correctly it's a mistake. |
My concern is that there is no a correct way to distinguish between incorrect |
@knqyf263 I have an idea. trying |
@knqyf263 Could you confirm that I understand correctly this requirement? thanks |
I know it's not ideal, but what if checking the length and the number of newlines?
|
I don't want to show the license text there as it's too long. I thought we would show "UNKNOWN", but we know a license text. We just don't know the short name. How about "Custom"? Then, the license text can be stored in another field. |
@knqyf263 I tried this way, you're right, it's not ideal. Counting new lines don't affect on the output, because Trivy reads only one line from license in dpkg, python packages also contain a long single line license... so this check is always true. About check text length. It works for long linceses in python. i thought it's a long text, but actually it's a few first rows of several licenses: License: Redistribution and use in source and binary forms, with or without
License: By obtaining, using, and/or copying this software and/or its
License: Permission to use, copy, modify, and distribute this software and
License: Redistribution and use in source and binary forms, with or without
License: This software is provided 'as-is', without any express or implied
License: Permission to use, copy, modify, and distribute this software and
License: Permission is hereby granted, free of charge, to any person
License: This software is provided 'as-is', without any express or implied
License: Permission is hereby granted, free of charge, to any person obtaining
under the terms of the GNU General Public License as published by the
section entitled ``GNU General Public License''.
License: Permission to use, copy, modify, and distribute this software and its
License: Permission to use, copy, modify, and distribute this software and its
License: This software is provided 'as-is', without any express or implied
License: Permission is hereby granted, free of charge, to any person obtaining
License: Redistribution and use in source and binary forms, with or without
License: This software is provided as-is, without express or implied
License: Permission to use, copy, modify, and distribute this software for any
License: Permission to use, copy, modify, and distribute this software and its
License:
License: * Permission to use this software in any way is granted without
License: Permission to use, copy, modify, and distribute this software and its |
Right now, I can't see a good solution, but there are several options:
@knqyf263 wdyt? |
You mean we should print |
What if we add one more check for
e.g.
My logic is as follows:
|
@DmitriyLewen that's an interesting idea. there are next cases for perl and python packages: perl: License: GPL-1+ or Artistic or Artistic-dist python3.9: License: This software is provided 'as-is', without any express or implied i'm not sure we can separate these cases, but maybe if we also will check string length... a long string with a few splited licenses is a text. |
Yeah. That's what I thought
What if use 30 characters + no saved licenses found |
Exactly |
@knqyf263 @DmitriyLewen There were selected a few obvious words, that can appear inside license texts only. Please, take a look at this suggestion when you have free time. thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afdesk I left comments. Take a look, please.
I tried severel ways to separate a long license text from a license name, and the best result for me is a detection by keywords.
Can you give examples of problems for other methods to save time if we need to come back to this question?
pkg/scanner/local/scan.go
Outdated
Severity: dbTypes.SeverityUnknown.String(), | ||
Category: ftypes.CategoryUnknown, | ||
PkgName: pkg.Name, | ||
Name: "CUSTOM License", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add info when we use CUSTOM License
in docs and license Name
/log message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to add the first few words of the license text to the name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to add the first few words of the license text to the
name
.
if @knqyf263 agrees too, I'll add it.
I thought about filtering by CUSTOM License
, but maybe it doesn't matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about CUSTOM License
again. This may not be clear to users.
What if we use the name Incomparable License
/Unmatched License
This means that we can't compare license text with known Trivy licenses and therefore store license as text.
cc. @knqyf263
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw some places calling them "custom licenses", so I don't think it's that bad.
Custom licenses are strongly disfavored and should be used only in extraordinary circumstances. An Agency ought to adopt a license approved by the Open Source Initiative (OSI) at opensource.org.
https://opensource.org/authority
If you are using a license that hasn't been assigned an SPDX identifier, or if you are using a custom license, use a string value like this one:
https://docs.npmjs.com/cli/v10/configuring-npm/package-json#license
Alternatively, you can use a LicenseRef- custom license identifier to refer to a license that is not on the SPDX License List, such as the following:
https://spdx.github.io/spdx-spec/v2.3/using-SPDX-short-identifiers-in-source-files/
Or "non-standard" or something like that.
Incomparable License
/Unmatched License
are also unclear what these did not match.
pkg/licensing/normalize.go
Outdated
func SplitLicenses(str string) []string { | ||
if str == "" { | ||
return nil | ||
} | ||
if isLicenseText(strings.ToLower(str)) { | ||
return []string{ | ||
"text://" + str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use file://
prefix in Python.
What if we create constants for these prefixes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
The main problem is to separate a license name and a license text. Checking the length doesn't work because there are correct long lincenses (it's already added to test cases):
the number of newlines doesn't work too, because we read only one line from copyright or license files. the number of spaces doesn't work, because there are too long correct license names (ex CDDL-1.0).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@knqyf263 wdyt about this way?
rpc/cache/service.pb.go
Outdated
@@ -1,6 +1,6 @@ | |||
// Code generated by protoc-gen-go. DO NOT EDIT. | |||
// versions: | |||
// protoc-gen-go v1.27.1 | |||
// protoc-gen-go v1.34.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we can skip this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change results the versions of protoc-gen-go
for cache/
and common
to the same version. it was build automatically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this update for this PR)
OK, we don't have an easy way. Let's see how it goes. |
Should i fix something to add this PR in 0.55? |
I'm reviewing the changes now. I'll update you soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored but realized there was no test. I'm not sure my changes work as expected.
@afdesk Could you add a test for license texts?
pkg/types/license.go
Outdated
@@ -22,6 +22,9 @@ type DetectedLicense struct { | |||
// Name holds a detected license name | |||
Name string | |||
|
|||
// LicenseText holds a long license text if Trivy detects a license name as a license text | |||
LicenseText string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Since this is "DetectedLicense", Text
looks enough.
LicenseText string | |
Text string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
rpc/common/service.proto
Outdated
string name = 5; | ||
float confidence = 6; | ||
string link = 7; | ||
string license_text = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
yeh, sure. the test is added |
Description
When we looks for licenses Trivy tries to split information about license through a regex.
but for some cases
License
field contains a long descriptive text.This PR adds a detection of a long license text and keep it inside a new field -
License
, as Dmitriy suggested.LinceseText
field is available for JSON format only. for TABLE format Trivy showsCUSTOM License
name instead of a long text.For tests I use next image:
Before:
Afrer:
JSON output:
Related issues
License
field from python packaging #5204Checklist