Skip to content

ROX-12238: Add node analysis package #911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 9, 2022
Merged

ROX-12238: Add node analysis package #911

merged 3 commits into from
Sep 9, 2022

Conversation

jvdm
Copy link
Contributor

@jvdm jvdm commented Sep 5, 2022

This PR is built on top of #904 (please filter commits in the diff for a narrowed view of the changes).

Add Analyze() to pkg/analyzer/nodes, including related changes to allow external packages (in particular, Compliance) to consume it. node.Analyze is built on the recently introduced interfaces added to pkg/analyzer, and re-uses clair.DetectContentFromFile().

Other marginal changes:

  • Move file size limits to the analyzer package.
  • Add metrics for node scanning.

Tests

UTs.

Local validation, here's the code (omitting imports):

func main() {
	components, err := nodes.Analyze("localhost", "/", true)
	if err != nil {
		panic(err)
	}

	s, err := json.MarshalIndent(components, "", "    ")
	if err != nil {
		panic(err)
	}
	fmt.Println(string(s))
}

Next steps: E2E.

@jvdm jvdm force-pushed the jvdm/ROX-12238/2/pull branch from a506b38 to a5de780 Compare September 6, 2022 06:10
@stackrox stackrox deleted a comment from openshift-ci bot Sep 6, 2022
@ghost
Copy link

ghost commented Sep 6, 2022

Images are ready for the commit at 0ab82fb.

To use the images, use the tag 2.25.0-87-g0ab82fb88b.

@jvdm jvdm requested review from daynewlee, c-du and RTann September 6, 2022 17:35
@c-du
Copy link
Contributor

c-du commented Sep 7, 2022

This PR is built on top of https://github.com/stackrox/scanner/pull/904 (please filter commits in the diff for a narrowed view of the changes).

You can generate the diff by changing the merge into branch from master to 'jvdm/ROX-12238/1/pull' in this case.

@c-du c-du changed the base branch from master to jvdm/ROX-12238/1/pull September 7, 2022 07:41
matcher := requiredfilenames.SingletonOSMatcher()
files, err := extractFilesFromDirectory(rootFSdir, matcher)
if err != nil {
return Components{}, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious in case of error, why not returning (nil, err)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason. Happy to change if there is an exact concern on returning an empty struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, it is recommended to use nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please share a reference of why this is the recommendation?

I've found https://github.com/golang/go/wiki/CodeReviewComments#declaring-empty-slices but that is for slices, which are treated as empty when nil. Could not find something specific for structs in general.

if err != nil {
return Components{}, nil
}
// File reading errors during analysis failed are not exposed to DetectFiles,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ... during failed analysis...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

fileExtractionErrorCountMetric = prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "file_extraction_error_count",
Help: "Number of files in an node filesystem scan that failed to read",
Buckets: []float64{50, 100, 500, 1000},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 50 too small as error count histogram?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

return filepath.WalkDir(filepath.Join(n.root, dir), func(fullPath string, entry fs.DirEntry, err error) error {
if err != nil {
if filesIsNotAccessible(err) {
m.ErrorCount()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errorcount is only for filesIsNotAccessible error? If so, is it good to specify it in the context of the error for the reader of the metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I follow. Are you suggesting we should modify the metric to make it more clear that it is tracking only inaccessible files, or something entirely different?

Notice that on every other error we fail the scan entirely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics by themselves suggest that it is the error count. Even if we fail the scan, the metrics readers are not in the context of the scan. You could either count all the errors or make it clear with either name or help of the metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks for clarifying.

}, nil
}

// Get implements analyzer.Files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain more the function of Get function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The details are explained in the documentation of analyzer.Files. If you have a better way of linking them together let me know.

@jvdm jvdm force-pushed the jvdm/ROX-12238/1/pull branch 2 times, most recently from 41b9775 to c257e53 Compare September 7, 2022 21:58
@jvdm jvdm changed the base branch from jvdm/ROX-12238/1/pull to master September 8, 2022 01:17
@jvdm jvdm changed the base branch from master to jvdm/ROX-12238/1/pull September 8, 2022 05:01
@jvdm jvdm force-pushed the jvdm/ROX-12238/2/pull branch from a5de780 to 1354404 Compare September 8, 2022 05:37
@jvdm jvdm changed the base branch from jvdm/ROX-12238/1/pull to master September 8, 2022 05:39
@jvdm
Copy link
Contributor Author

jvdm commented Sep 8, 2022

This PR is built on top of #904 (please filter commits in the diff for a narrowed view of the changes).

You can generate the diff by changing the merge into branch from master to 'jvdm/ROX-12238/1/pull' in this case.

If I target the PR into jvdm/ROX-12238/1/pull I won't trigger CI checks. Also, once 1/pull is merged, its commit will change (squash or rebased), and the target of this PR becomes obsolete.

@jvdm jvdm requested a review from c-du September 8, 2022 05:49
@jvdm jvdm force-pushed the jvdm/ROX-12238/2/pull branch from 1354404 to 0ab82fb Compare September 8, 2022 20:46
@jvdm
Copy link
Contributor Author

jvdm commented Sep 8, 2022

/retest

@openshift-ci
Copy link

openshift-ci bot commented Sep 8, 2022

@jvdm: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-tests 0ab82fb link false /test e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jvdm jvdm merged commit ce27aff into master Sep 9, 2022
@jvdm jvdm deleted the jvdm/ROX-12238/2/pull branch September 9, 2022 14:50
// the files of a filesystem or image layer. For layers, the parent layer should
// be specified. For filesystems, which don't have the concept of intermediate
// layers, or the root layer, use `nil`. Notice that language components are not
// extracted by DetectFromFiles, but if provided they are annotated with
Copy link
Collaborator

@RTann RTann Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if provided they are annotated with certified RHEL dependencies

Not sure I follow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Language components are not detected by this function, rather they are detected by the analyzers embedded in the matcher. But we pass components here so that the certified RHEL path will annotate the components with package info.

Personally, I would leave that piece out of the detector, arguing that it should be its responsibility to annotate language components. But OK the way it is.

fileExtractionMatchCountMetric.Observe(m.fileCount)
fileExtractionInaccessibleCountMetric.Observe(m.inaccessibleCount)
}
*m = FileExtractionMetrics{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if m == nil, then this will segfault

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


// Information about extracted files.
type fileMetadata struct {
// If true, the file has executable permissions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also a regular file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

// will stop returning entries and the error will be available at readErr().
// Analyzer are expected to handle missing files gracefully during analysis and
// check readErr() to confirm all potential data was read.
type filesMap struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a line

var _ analyzer.Files = (*filesMap)(nil)

See https://github.com/uber-go/guide/blob/master/style.md#verify-interface-compliance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


// readErr returns the first error encountered when reading file content, and
// resets the error, enabling re-scans using the same map.
func (n *filesMap) readErr() error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who calls this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just after clair.DetectFromFiles() (now clair.DetectComponents()), to verify if there was any error when reading files during detection.

wantErr: assert.NoError,
},
{
name: "when file is extractable and size is bigger then the limit then return nothing",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: than instead of then

wantExists: true,
},
{
name: "when file is in map and is extractable but does not exist, then keep error",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between extraction and reading the file might have moved, was deleted, changed permission or something else happened to the underlying filesystem (e.g. failures).

if err != nil {
return nil, nil
}
// File reading errors during analysis are not exposed to DetectFiles, hence we
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: DetectFromFiles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

jvdm added a commit that referenced this pull request Sep 9, 2022
@RTann RTann mentioned this pull request Sep 20, 2022
RTann added a commit that referenced this pull request Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants