Skip to content

feat: allow controlling which extractors are enabled #1846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Jun 3, 2025

Conversation

G-Rath
Copy link
Collaborator

@G-Rath G-Rath commented Apr 28, 2025

This allows more flexibility by enabling users to control which extractors are enabled (or disabled), similar to call scan analysis.

Resolves #1768

@G-Rath G-Rath changed the title Extractors/toggle2 feat: allow controlling which extractors are enabled Apr 28, 2025
@G-Rath
Copy link
Collaborator Author

G-Rath commented Apr 28, 2025

todos:

  • add documentation
  • rename cli flags (it should be something like --disable-extractors, or maybe even "extractor" since you pass them in one at a time?)
  • switch to passing the extractors themselves in osvscanner.ScannerActions rather than just their names
    • while the issue originally called for the names as strings, the end goal is to be able to pass in custom extractors/plugins, and so I'm pretty sure it costs us very little to start doing that now which might save us some deprecation work
  • sort out what to do with sboms, as we currently have a unique flag for them which in the long-run we probably ideally want to get merged into -L and co

@G-Rath G-Rath force-pushed the extractors/toggle2 branch 4 times, most recently from 4fb3cc2 to dd8981c Compare April 29, 2025 00:02
@another-rex
Copy link
Collaborator

another-rex commented Apr 29, 2025

rename cli flags (it should be something like --disable-extractors, or maybe even "extractor" since you pass them in one at a time?)

I'm pretty sure cli allows you to pass it in comma separated?

@G-Rath
Copy link
Collaborator Author

G-Rath commented Apr 29, 2025

Yeah I'm going to refresh my memory on that, because I'm pretty sure you're right but then we were doing custom splitting for our generic license flag and so maybe we actually have a bug with that flag? so I'm going to double check what we're actually doing for all our list flags to make sure its consistent 😅

@G-Rath G-Rath force-pushed the extractors/toggle2 branch from a4384a9 to defd573 Compare April 29, 2025 00:16
@codecov-commenter
Copy link

codecov-commenter commented Apr 29, 2025

Codecov Report

Attention: Patch coverage is 91.37931% with 20 lines in your changes missing coverage. Please review.

Project coverage is 65.68%. Comparing base (1066081) to head (8edae54).
Report is 89 commits behind head on main.

Files with missing lines Patch % Lines
...nal/scalibrextract/filesystem/vendored/vendored.go 72.00% 6 Missing and 1 partial ⚠️
internal/scalibrextract/vcs/gitrepo/extractor.go 69.56% 6 Missing and 1 partial ⚠️
...nguage/java/pomxmlenhanceable/pomxmlenhanceable.go 85.71% 3 Missing ⚠️
pkg/osvscanner/scan.go 92.30% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1846      +/-   ##
==========================================
+ Coverage   65.39%   65.68%   +0.28%     
==========================================
  Files         164      166       +2     
  Lines       15926    16063     +137     
==========================================
+ Hits        10415    10551     +136     
  Misses       4847     4847              
- Partials      664      665       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@another-rex another-rex self-requested a review May 12, 2025 20:01
Copy link
Collaborator

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach LGTM!

I do quite like the configure idea, I'll have a look and see if this will work for configuring plugins in scalibr as well.

As for passing extractors as types directly into the DoScan function, this does mean we'll have to make all the extractors public right? I'm not sure we want to do that in this repo, and probably will have to migrate it into OSV-Scalibr.

toDisable := make(map[string]bool)

for _, disabled := range disabledExtractors {
if names, ok := presets[disabled]; ok {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get rid of the if check, if preset doesn't exist, then the for loop wouldn't run.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right that we don't need the condition for the disabled section, but we do need it for the enabled section otherwise the preset name itself will end up being added as an extractor

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense.

"artifact": scalibrextract.ExtractorsArtifacts,
}

func ResolveEnabledExtractors(enabledExtractors []string, disabledExtractors []string) []string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Personally think it's clearer to add all the enabledExtractors in a set, then loop through all the disabled extractors to disable them, then convert to slice.

names := actions.ExtractorNames

if len(names) == 0 {
for _, preset := range defaultExtractorNames {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: slices.Concat can be used here. I think that means we can get rid of the [][]string and just have a single [] and have the caller call slices.Concat themselves.

// since "IncludeRootGit" is always true
gitrepo.Configure(tor, gitrepo.Config{
IncludeRootGit: actions.IncludeGitRoot,
Disabled: !actions.IncludeGitRoot,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me realize that this might just be a bug at the moment, I don't think I intended skip root git to also skip subdirectories as well, but at the same time I guess we currently have no flags to disable skipping submodule scanning before this PR is landed.

Now that we do have a way to disable it, can you change this to also scan submodules? (We definitely should mention this behavior change in our changelog.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was going to talk to you about that + the wider "we should probably make our extractors public in some way"

if it's ok with you, I'd like to leave our extractors alone in this PR and change them as a follow-up so that they've got their own changelog entries

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that sounds good

@G-Rath G-Rath force-pushed the extractors/toggle2 branch from 89c9433 to 232dfa6 Compare May 20, 2025 03:44
@G-Rath
Copy link
Collaborator Author

G-Rath commented May 20, 2025

As for passing extractors as types directly into the DoScan function, this does mean we'll have to make all the extractors public right? I'm not sure we want to do that in this repo, and probably will have to migrate it into OSV-Scalibr.

Yes, but that shouldn't need to happen right away because you can archive the existing functionality by just not passing in any extractors - so anyone wanting to use just those extractors specifically would already need to make a feature request to have them extracted out; in the meantime, people wanting to use our extractors and their own extractors should be able to double-call the library

(remember too this is all currently experimental, so that fits nicely I think)

@G-Rath G-Rath requested review from another-rex, hogo6002 and cuixq May 20, 2025 04:33
@G-Rath G-Rath force-pushed the extractors/toggle2 branch 2 times, most recently from 84c6cf6 to 7c58398 Compare May 20, 2025 21:22
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very initial start of a doc, which has ended up being mainly just a capturing of what I think are key things for us to dive into as I realized while this PR does do what we want, there's at least a couple of key changes we should make around it like cleaning up our private extractors and having all extractors run (aka merge the lockfiles, directories, and sboms presets) which would make this documentation much simpler.

on an aside, I've added this in a manner that leverages Jekyell more (notably using subdirectories to imply the subpath) which we might like to actually do - if so I can do a restructure PR but want to check we actually like moving away from a flat structure

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mark this page as hidden for now to get the logic merged in (I think function wise it's still mostly the same right? (if they just don't enable/disable extractors), and once we clean up how our different flags work we can re-enable it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this as part of working on the docs as I realized our current way of doing presets.go is very opaque, so I think it might be better if we generate them as strings so that tools and humans can look at the file statically to figure out the actual names of extractors.

I have run it locally to confirm it works, but have not committed the result since we need to talk about the broader picture first

Copy link
Collaborator

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's get this merged in as it is, and we can work on a few of the followup things:

  • go generate for generating a list of available plugins
  • Cleanup SBOM flags
  • Add docs
  • skip git behavior

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mark this page as hidden for now to get the logic merged in (I think function wise it's still mostly the same right? (if they just don't enable/disable extractors), and once we clean up how our different flags work we can re-enable it.

toDisable := make(map[string]bool)

for _, disabled := range disabledExtractors {
if names, ok := presets[disabled]; ok {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense.

@G-Rath G-Rath force-pushed the extractors/toggle2 branch from 7c58398 to 8f1c054 Compare May 28, 2025 04:00
@G-Rath
Copy link
Collaborator Author

G-Rath commented May 28, 2025

@another-rex I've removed both the doc and generator files for now - they're captured by GitHub in this PRs history so they'll be easy to bring back

@G-Rath G-Rath marked this pull request as ready for review May 28, 2025 04:01
@G-Rath G-Rath force-pushed the extractors/toggle2 branch 3 times, most recently from 43c1f0d to 10f2feb Compare May 28, 2025 19:52
@G-Rath G-Rath force-pushed the extractors/toggle2 branch from 676cc8b to 8edae54 Compare June 3, 2025 00:51
@another-rex another-rex merged commit 87a2039 into google:main Jun 3, 2025
15 checks passed
@another-rex another-rex deleted the extractors/toggle2 branch June 3, 2025 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support selectively enabling/disabling individual plugins
3 participants