-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scan and report dependency groups of vulnerabilities #655
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #655 +/- ##
==========================================
+ Coverage 78.86% 78.92% +0.06%
==========================================
Files 84 85 +1
Lines 5943 6032 +89
==========================================
+ Hits 4687 4761 +74
- Misses 1054 1066 +12
- Partials 202 205 +3 ☔ View full report in Codecov by Sentry. |
I've not had a full look over this, but fwiw I think it would be better to make groups an array rather than a single string because some ecosystems support packages belonging to multiple groups i.e. at least Bundler and Poetry. While I don't think that's expressed in pretty much any of the lockfiles, given the accuracy goals of the scanner I don't think it's unreasonable to expect (/hope) that one day the scanner will be able to determine that (such as by looking at related files) and so having an array now should make it easier to support (otherwise it'll be a breaking change). I'm also not sure if it's worth making public constants for all the different types of groups, but I'm less fussed about that 🤷 |
We actually had a discussion about if we should make the group info an array (my first proposal was to make it an array). Considering the purpose is to help users prfioritize what vulnerabilities to handle, we may only need to know if a dependency is of the default group or not. Then I tried to just use a bool value to represent that, but this makes it harder to provide more information to users in the reporters, so I finally made this a string. However I think if we do care about the exact group names, we should make it an array. Regarding the constants, it is of the similar consideration - I would like to display the information as close as to the ecosystem terminology since this dependency group information is quite different among ecosystems. I am happy to remove them and just use the strings. Also when I was researching the dependency groups for different ecosystems, I realised that some ecosystems only have this information available in manifest but not lockfile, so if we want to have a better support on this, we should also support parsing manifests. |
Yeah it comes down to what your priorities are - personally I think at least in the lockfile/library space the scanner should be trying to capture info in a generally neutral manner i.e. it should not decide which group of dependency takes priority, it should just capture all groups a package is part of. Instead deciding how to treat the different groups should be a downstream responsibility (e.g. being handled by the scanner CLI component, scorecard who are just using the library, etc). Case and point, NPM records if a package is a dev dependency, peer dependency, and optional dependency, and you demonstrate in your test cases its possible for a dependency to be both dev and optional - you have then had to choose one to favor because you cannot record both (incidentally I believe you should be favoring "dev" not "optional" because the latter will always be installed if possible, where the former might be excluded in production type builds). You're also not capturing the groupings for |
Agree with the point that the parser should capture all the relevant information and let the downstream to decide. This reminds me of one thing about making groups as an array: how we record the default group?
|
Also regarding the parsers, feel free to point out anything that I am not doing it correctly! I am not an expert on all the ecosystems, and some of the implementation is just based on my brief research. There is also a TODO for Gradle: I saw |
For now I think it should be fine to just capture groups as they are - so in the case of e.g. NPM which doesn't record I'm pretty sure the only ecosystem currently where that might cause confusion is with
You're doing fine 🙂 while I wrote most of them and all of |
internal/output/table.go
Outdated
outputRow = append(outputRow, pkg.Package.Ecosystem, pkg.Package.Name, pkg.Package.Version) | ||
name := pkg.Package.Name | ||
if len(pkg.DepGroups) > 0 { | ||
name += fmt.Sprintf(" (%s)", strings.Join(pkg.DepGroups, ",")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, can we perhaps be more opinionated here in the table output, and just output e.g. "(dev)" in the cases where it is a dev dependency?
Users may not care about optional etc most of the time, and it would just clutter this table. If they do want the exact group details, they can use the JSON output.
We can define perhaps some kind of per-ecosystem function which just does:
type Ecosystem interface {
IsDevGroup(groups []string) bool
}
or something similar per ecosystem?
@another-rex @G-Rath wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ecosystem is now a string
can we still make it an interface?
Or, we can have something like this?
func (s Ecosystem) IsDevGroup(groups []string) bool {
switch s {
case CargoEcosystem:
...
case NpmEcocystem:
...
}
...
return false;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree we should just stick to a single output in the table output, but I think we should do it with a separate column.
I'm thinking "prod" column with a checkmark if it's a production dependency, and empty if not a production dep. This way it makes it applicable to all the other not "dev" named dependencies (e.g. optional in npm).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re column, I think the reason we went with a "(dev)" marker for now is that it because the column becomes redundant whenever we do/support any kind of filtering, and makes things inconsistent.
This is also somethign we can change at any time, given that there's no stability guarantees for the table/human readable output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a method on Ecosystem
to check if there is dev
group in the provided groups.
pkg/lockfile/dpkg-status.go
Outdated
@@ -122,7 +124,7 @@ func (e DpkgStatusExtractor) Extract(f DepFile) ([]PackageDetails, error) { | |||
|
|||
// PackageDetails does not contain any field that represent a "not installed" state | |||
// To manage this state and avoid false positives, empty struct means "not installed" so skip it | |||
if (PackageDetails{}) == pkg { | |||
if reflect.DeepEqual(PackageDetails{}, pkg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my understanding is that pulling in reflect
can be a big deal in Go, and at at least in the past with some methods (which tbh probably don't include DeepEqual
) could trigger sizeable deoptimizations.
I don't think this check is anything special and very internal so could be easily replaced with something like returning a pointer (so that we can then return nil
), returning an error
that we can eat here, or setting the Name
to something like <not installed>
.
I think that's probably worth doing either way so that we're not as dependent on the whole structure of PackageDetails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the code here to check each field in PackageDetails
and if they are all "empty". wdty?
pkg/lockfile/parse-pnpm-lock.go
Outdated
@@ -23,6 +23,7 @@ type PnpmLockPackage struct { | |||
Resolution PnpmLockPackageResolution `yaml:"resolution"` | |||
Name string `yaml:"name"` | |||
Version string `yaml:"version"` | |||
Dev string `yaml:"dev"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we be able to parse this as a bool directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch - I remember bool
didn't work so I put string
here but I just tried this again and bool
is perfectly fine.
this just makes it easier for our users to use.
Note that the `SBOMReader` refactor wasn't required for this, but my IDE flagged it so I just included it here 🤷
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [deps.dev/api/v3alpha](https://togithub.com/google/deps.dev) | require | digest | `a2ccd03` -> `e40c4d5` | | golang.org/x/exp | require | digest | `7918f67` -> `9a3e603` | | golang.org/x/term | require | minor | `v0.13.0` -> `v0.14.0` | --- ### Configuration 📅 **Schedule**: Branch creation - "before 6am on monday" in timezone Australia/Sydney, Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 👻 **Immortal**: This PR will be recreated if closed unmerged. Get [config help](https://togithub.com/renovatebot/renovate/discussions) if that's undesired. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/google/osv-scanner). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40Ni4wIiwidXBkYXRlZEluVmVyIjoiMzcuNDYuMCIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==-->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cuixq!! This looks awesome. Sorry for the delay in reviewing this.
Out of curiosity, what's the difference between PackageVulns and VulnerabilityFlattened? I see |
good find - we should add |
Issue #332
Non-default dependency groups are recorded in strings as per eco-system:
packages-dev
build-requires
andpython-requires
<scope/>
in<dependency/>
dev
andoptional
dependenciesdevelop
dev
as trueoptional = true
dev
Reporters:
abc (development)
dependencyGroups