Skip to content

ROX-12967: Fix RHCOS detection and namespace generation #1026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jan 3, 2023

Conversation

Maddosaurus
Copy link
Contributor

@Maddosaurus Maddosaurus commented Dec 2, 2022

This PR fixes the bug that RHCOS is detected by the CentOS RegEx.
Additionally, it introduces an alternative rpm db location, as RHCOS keeps its db at /usr/share/rpm as opposed to RHEL, which keeps it at /var/lib/rpm.
Finally, it extends the util function with another namespace check, as RHCOS should also be treated as a well known certified platform.

A test was extended to ensure the correct detection and namespace format of RHCOS.
As discussed in Thursdays sync meeting with @RTann, we explicitly return the version including minor (e.g. rhcos:4.11 as opposed to rhel:8).
I decided for a standalone RegEx instead of extending the RHEL RegEx, as I didn't want to change how RHEL versions are handled and still return a major.minor version for RHCOS.

Caveat: As the range matcher.GetCommonPrefixDirs() call leads to an extremely intensive filesystem walk, I introduced a very limited set of directories to search for now. We already have a follow-up item planned to investigate and improve the runtime behaviour of this code.

For these changes to run successfully in a Compliance container, an additional emptyDir mount is needed for Compliance at /tmp, as the code copies the RPM DB and opens the copy in /tmp

Testing:
I verified this on a UBI8 image as well as an unpacked RHCOS filesystem (with the help of local-nodescanner, #1027 ).

  • UBI8 Image running local-nodescanner, scanning UBI8 (ROX-12967: Introduce local-nodescanner #1027)
  • UBI8 Image running local-nodescanner, scanning an unpacked RHCOS ISO image on filesystem
  • Compliance image with feature flag enabled, running this Sensor branch on an OS 4.11.17 cluster (Infra-provisioned)

Copy link
Contributor

@vikin91 vikin91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good!
Could we test it on Openshift nodes before merging?

@@ -20,7 +20,7 @@ func IsCentOSNamespace(namespace string) bool {
// The namespace is expected to be of form `namespacename:version` but only `namespacename` is required.
// For example: rhel:7, rhel:8, centos:8, ubuntu:14.04.
func IsRHELNamespace(namespace string) bool {
return strings.HasPrefix(namespace, "rhel")
return (strings.HasPrefix(namespace, "rhel") || strings.HasPrefix(namespace, "rhcos"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are sure that rhcos is the correct string that we will get here?
Could we simply return rhel from the regexp and skip this modification?

I am not suggesting anything, just asking as I truly don't know myself :) @RTann help :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, we ended our sync meeting yesterday (Thursday) with the notion that we want to identify RHCOS as rhcos:4.11, hence this check, to ensure we identify it as discussed and still enter the verified scanning functionality.
But I'm open to changing this should I have misunderstood that :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we make a separate IsRHCOSNamespace and then do the || when used by the client. What do you think?

@ghost
Copy link

ghost commented Dec 2, 2022

Images are ready for the commit at 9fcc0a7.

To use the images, use the tag 2.27.x-29-g9fcc0a7815.

@Maddosaurus
Copy link
Contributor Author

Maddosaurus commented Dec 5, 2022

We do run into problems when executing this on a real running OS4 node.
The OS is correctly identified as rhcos:4.11, but no Components are discovered.
I'm tracking down the problem ATM. What I know so far: The package DB location (/var/lib/rpm) is never added to the filesMap.files (node.go#43) array, so the code that reads the components from that DB is not able to find it.
This behaviour does neither show on RHEL (directly on RHEL as well as unpacked RHCOS ISO filesystem) nor on MacOS with an unpacked filesystem.

@@ -20,7 +20,7 @@ func IsCentOSNamespace(namespace string) bool {
// The namespace is expected to be of form `namespacename:version` but only `namespacename` is required.
// For example: rhel:7, rhel:8, centos:8, ubuntu:14.04.
func IsRHELNamespace(namespace string) bool {
return strings.HasPrefix(namespace, "rhel")
return (strings.HasPrefix(namespace, "rhel") || strings.HasPrefix(namespace, "rhcos"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we make a separate IsRHCOSNamespace and then do the || when used by the client. What do you think?

@Maddosaurus
Copy link
Contributor Author

The updated code can now run and produce results if executed as local-nodescanner directly on the Openshift 4 node.
If run in Compliance, we run into issues:

With the additional changes and with a manually mounted emptyDir at /tmp, I am one step further.
The scan now successfully identifies the DB, but is unable to restore it. I will continue the investigation from here.

No certificates found in /usr/local/share/ca-certificates
No certificates found in /etc/pki/injected-ca-trust
main: 2022/12/06 17:22:28.636143 main.go:249: Info: Running StackRox Version: 3.73.x-95-g3b4b3561a9-dirty
main: 2022/12/06 17:22:28.637229 main.go:257: Info: Initialized Sensor gRPC stream connection
main: 2022/12/06 17:22:28.637350 main.go:274: Info: Node Rescan interval: 4h0m0s
main: 2022/12/06 17:22:28.637475 main.go:285: Info: Using NodeScan
time="2022-12-06T17:22:28Z" level=info msg="add files from directory" directory=etc/ root=/host
main: 2022/12/06 17:22:28.697936 main.go:243: Info: Successfully connected to Sensor at sensor.stackrox.svc:443
main: 2022/12/06 17:22:28.698723 main.go:94: Info: Starting audit log reader on node mat-0612-xmsd7-master-0.c.srox-temp-dev-test.internal in cluster d48d4377-f3c8-408d-b0cd-559073e8f28c using previously saved state: {
  "collectLogsSince": "2022-12-06T17:21:11.653779Z",
  "lastAuditId": "89753d51-e918-4bda-bf7b-5e314f77cc51"
})
time="2022-12-06T17:22:28Z" level=info msg="add files from directory" directory=usr/share/rpm root=/host
time="2022-12-06T17:22:28Z" level=info msg="add files from directory" directory=var/lib/rpm root=/host
added following files to filesMap: map[etc/os-release:0xc000a20c9c etc/redhat-release:0xc000a213d8 etc/system-release:0xc000e081dc etc/system-release-cpe:0xc000e081f0 usr/share/rpm/Packages:0xc000e085fb]
filesMap.Get full absPath: /host/etc/redhat-release
filesMap.Get full absPath: /host/etc/redhat-release
Trying known DB paths: map[Packages:{} rpmdb.sqlite:{}]
checking whether DB exists at var/lib/rpm/Packages
filesMap.Get full absPath: /host/usr/share/rpm/Packages
RHCOS DB exists!
checking whether DB exists at var/lib/rpm/rpmdb.sqlite
Created rpm db /tmp/rpm3788663974/Packages
time="2022-12-06T17:22:29Z" level=warning msg="failed to rebuild the rpm database: "
collection/nodescanv2: 2022/12/06 17:22:29.090198 nodescan.go:23: Info: Finished node inventory /host scan
collection/nodescanv2: 2022/12/06 17:22:29.090295 nodescan.go:27: Info: Components found under /host: <nil>
main: 2022/12/06 17:22:29.091558 main.go:243: Info: Successfully connected to Sensor at sensor.stackrox.svc:443

@Maddosaurus Maddosaurus marked this pull request as draft December 7, 2022 12:30
@Maddosaurus Maddosaurus marked this pull request as ready for review December 7, 2022 16:03
Copy link
Contributor

@vikin91 vikin91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

How does it work without this?

For these changes to run successfully in a Compliance container, an additional emptyDir mount is needed for Compliance at /tmp, as the code copies the RPM DB and opens the copy in /tmp

@@ -89,7 +89,7 @@ func extractFilesFromDirectory(root string, matcher matcher.PrefixMatcher) (*fil
files: make(map[string]*fileMetadata),
}
m := metrics.FileExtractionMetrics{}
for _, dir := range matcher.GetCommonPrefixDirs() {
for _, dir := range []string{"etc/", "usr/share/rpm", "var/lib/rpm"} { //TODO(ROX-13771): Use range matcher.GetCommonPrefixDirs() again after fixing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the style-check happy:

Suggested change
for _, dir := range []string{"etc/", "usr/share/rpm", "var/lib/rpm"} { //TODO(ROX-13771): Use range matcher.GetCommonPrefixDirs() again after fixing
// TODO(ROX-13771): Use range matcher.GetCommonPrefixDirs() again after fixing
for _, dir := range []string{"etc/", "usr/share/rpm", "var/lib/rpm"} {

@@ -100,6 +103,7 @@ func DatabaseFiles() []string {
paths := make([]string, 0, databaseFiles.Cardinality())
for filename := range databaseFiles {
paths = append(paths, path.Join(databaseDir, filename))
paths = append(paths, path.Join(rhcosDbDir, filename))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a single append with two items:

paths = append(paths, path.Join(rhcosDbDir, filename), path.Join(databaseDir, filename))

@Maddosaurus
Copy link
Contributor Author

Looks great!

How does it work without this?

So, technically this is a requirement of this code running in the Compliance container.
If you run it, e.g. as local scanner, it will use your hosts' /tmp/ folder, so I would consider this requirement more for stackrox and less for this scanner code

@vikin91
Copy link
Contributor

vikin91 commented Dec 9, 2022

Could you maybe also handle the issue with this statement in this PR?

I would like to be aware of this error when it happens, so I would suggest either:

	if err != nil {
		return nil, err
	}

or

	if err != nil {
		log.Infof("Ignoring filesystem analysis error: %v", err)
		return nil, nil
	}

I think it is a bug and the first suggestion should be used. If you want to move more carefully, I would be okay with adding the log message only.

Copy link
Contributor

@msugakov msugakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that it took me so long to provide this feedback. I need more changes/conversations to understand what happens here.

@@ -34,6 +34,7 @@ var (
oracleReleaseRegexp = regexp.MustCompile(`(?P<os>Oracle) (Linux Server release) (?P<version>[\d]+)`)
centosReleaseRegexp = regexp.MustCompile(`(?P<os>[^\s]*) (Linux release|release) (?P<version>[\d]+)`)
redhatReleaseRegexp = regexp.MustCompile(`(?P<os>Red Hat Enterprise Linux) (Client release|Server release|Workstation release|release) (?P<version>[\d]+)`)
rhcosReleaseRegexp = regexp.MustCompile(`(?P<os>Red Hat Enterprise Linux) (CoreOS release) (?P<version>[\d]+[\.]?[\d]*)`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Maddosaurus , what you write in PR description definitely helps understand the intent.

As discussed in Thursdays sync meeting with @RTann, we explicitly return the version including minor (e.g. rhcos:4.11 as opposed to rhel:8).

However, it does not explain why. Could you please try to reconstruct the reasoning and put is as a

// <Short comment line with _why_> 

next to this item?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to attribute found vulnerabilities more exactly to the OpenShift version.

Comment on lines 30 to 36
if namespace != nil && (wellknownnamespaces.IsRHELNamespace(namespace.Name) || wellknownnamespaces.IsRHCOSNamespace(namespace.Name)) {
// This is a RHEL-based image that must be scanned in a certified manner.
// Use the RHELv2 scanner instead.
Copy link
Contributor

@msugakov msugakov Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mentioned in the meeting is that wellknownnamespaces.IsRHELNamespace and wellknownnamespaces.IsRHCOSNamespace work differently when uncertified scanning is performed and so what is formulated in if condition now does not stack correctly against this comment below.

		// This is a RHEL-based image that must be scanned in a certified manner.
		// Use the RHELv2 scanner instead.

The difference in behavior is such that:

  • wellknownnamespaces.IsRHELNamespace is possible only when uncertified scanning is off. Otherwise, things go "centos" way and featureVersions are filled instead of RHELv2Components.
  • wellknownnamespaces.IsRHCOSNamespace will be irrespective of uncertified scanning setting and will always lead to RHELv2Components. However, in case of actually uncertified OS, we'll have some things like CPEs unset.

At minimum this requires a code comment explaining what happens and why. Most likely, this should be addressed as part of the same epic. In such case a // TODO(ROX-12345): blah comment is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the question if an uncertified RHCOS exists to our meeting question catalogue for today and created ROX-13906 to track and implement the changes (if needed). I am also adding a comment and ticket reference in code.

@Maddosaurus
Copy link
Contributor Author

/retest

Copy link
Collaborator

@RTann RTann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some comments to consider

// no matter what value uncertifiedRHEL is.
// This is in contrast to RHEL, where uncertifiedRHEL determines whether we scan for RHEL or CentOS.
// TODO(ROX-13906): Verify RHCOS logic
if namespace != nil && (wellknownnamespaces.IsRHELNamespace(namespace.Name) || wellknownnamespaces.IsRHCOSNamespace(namespace.Name)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for another time, but perhaps we can make something like UseCertifiedWorkflow which checks if the namespace is rhel or rhcos. Minimizes all the this || that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good idea!
I have added a follow up ticket (ROX-14028) to track the refactoring.

// a different path that symlinks to var/lib/rpm.
// Since PR #1014, Scanner code follows symlinks , but usr/share/rpm will get filtered out
// as it is not a well known database dir unless defined here.
rhcosDbDir = "usr/share/rpm"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rhcosDBDir

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to do something like:

databaseDirs = []string{"var/lib/rpm", "usr/share/rpm"}

and have nested for loops through darabaseDirs and databaseFiles

Copy link
Contributor Author

@Maddosaurus Maddosaurus Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose a separate var to improve understandability, combined with the detailed code comment explaining the var.
That said, I'm open to changing it to a nested loop if that fits the rest of the codebase better.
WDYT?

Comment on lines 47 to 48
// Since PR #1014, Scanner code follows symlinks , but usr/share/rpm will get filtered out
// as it is not a well known database dir unless defined here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent about an hour trying to understand this. Unrelated thing, but I found a security vulnerability https://issues.redhat.com/browse/ROX-14035.

I think what's described here is not how #1014 works. Moreover, #1014 did not actually touch the code which has this effect of skipping usr/share/rpm.

The reason why var/lib/rpm symlink is not followed is because node.filesMap (implementation of analyzer.Files interface) is populated initially via addFiles() function which uses filepath.WalkDir() library function.

// addFiles searches the directory for files using the matcher and adds them to the file map.
func (n *filesMap) addFiles(dir string, matcher matcher.Matcher, m *metrics.FileExtractionMetrics) error {
logrus.WithFields(logrus.Fields{
"root": n.root,
"directory": dir,
}).Info("add files from directory")
return filepath.WalkDir(filepath.Join(n.root, dir), func(fullPath string, entry fs.DirEntry, err error) error {

In the documentation of filepath.WalkDir() it is said:

WalkDir does not follow symbolic links.

Basically, WalkDir will see that var/lib/rpm is a symlink and will not traverse it.

On one hand, usr/share/rpm is a viable substitute for now, but in order to move with it long-term, we need to obtain a confirmation that it will stay as a location where CoreOS team will store RPMDB in all future versions.

On the other hand, var/lib/rpm is a convention and we'll do better job if we teach our code work with symlinks and respect conventions rather than add more and more special cases.
Unfortunately, dealing with symlinks is difficult and like many other difficult things isn't implemented in Go standard library. It is also additionally complicated by the fact that our symlinks have custom root. If there's no good library solving this problem (I wonder how is it solved in the container scanning!?), I suggest to get a promise from the CoreOS team that they maintain /usr/share/rpm as a contract and move on with this.

I'm fine with a TODO for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch with the potential vuln! If we are to keep our ability to read symlinks, we should definitely handle any known potential vulnerability before releasing. Luckily, we have handled a similar case before when it came to reading ZIP files (see https://security.snyk.io/research/zip-slip-vulnerability for more information about ZIP-slip): https://github.com/stackrox/scanner/blob/2.27.3/pkg/ziputil/open.go#L60

If we keep the symlink stuff, then let's definitely ensure the file is under the root.

Similarly, I have been thinking about just simply looking at /usr/share/rpm instead, and preferring that over /var/lib/rpm when we know we are looking at RHCOS. Perhaps that could be a good followup after the MVP, so we don't delay this

@RTann RTann requested a review from jvdm December 19, 2022 20:00
Copy link
Contributor

@msugakov msugakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I hope the testing also completes successfully.

@Maddosaurus
Copy link
Contributor Author

Thanks for the suggestions, I have accepted them.
With the latest set of changes, we are still able to produce valid results on an OS 4.11 cluster.

I have pushed a set of images with the latest changes in this PR as well as the stackrox/stackrox PR 3722.
You can get a running deployment by doing the following (e.g. with a Kubeconfig pointing to an OS 4.11 cluster):

export MAIN_IMAGE_REPO=quay.io/mmeiding/playground
export MAIN_IMAGE_TAG=main-g87a5cdaf05-dirty

export CENTRAL_DB_IMAGE_TAG=3.73.x-176-g5e618a632d

export ROXCTL_IMAGE_REPO=quay.io/mmeiding/playground
export ROXCTL_IMAGE_TAG=roxctl-g87a5cdaf05-dirty

./deploy/openshift/deploy.sh

@Maddosaurus
Copy link
Contributor Author

/retest

Maddosaurus and others added 14 commits January 3, 2023 14:57
- Roll back RegEx change
- Introduce IsRHCOSNamespace
- RHCOS has its RPM DB in a different path
- RHCOS provides a CPE json in a different path
- Return error
- Add code comments & refs
- Add todo item
- Rename rhcosDBDir
- Removed RHCOS check for models
- Add tracking todo
- Specify code comments
Co-authored-by: Misha Sugakov <537715+msugakov@users.noreply.github.com>
Co-authored-by: Misha Sugakov <537715+msugakov@users.noreply.github.com>
@Maddosaurus Maddosaurus force-pushed the mm/rox-12967-fix-rhcos-detection branch from 8cd5f2a to 9fcc0a7 Compare January 3, 2023 13:58
@Maddosaurus Maddosaurus merged commit d5eb96c into master Jan 3, 2023
@Maddosaurus Maddosaurus deleted the mm/rox-12967-fix-rhcos-detection branch January 3, 2023 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants