Skip to content

Change the caching mechanism to be based on JSON files #395

Closed
@knqyf263

Description

Overview

We would like to create a JSON file for every container image layer.
The JSON file will represent the contents of an image layer, allowing Trivy to store this information for rescans and for cache purposes. This will replace the current method, where we save the entire image layer, or saving specific files of the image layer. The JSON based approach requires less disk space, and is also more useful for client-server architecture, where you want to do the caching on the server-side.

JSON Schema

All schema elements are optional. The only mandatory filed is “json-version”.

{
	"json-version": "1.0",
	"digest": "234982938492384",
	"os-name": "RHEL",
	"os-version": "7.2",
	"os-packages": [
		{
			"name": "Pkg1",
			"version": "7.2",
			"release": "7",
			"epoch": 1,
			"arch": "x64"
		},
		{
			"name": "Pkg2",
			"version": "1.2",
			"release": "3",
			"epoch": 0,
			"arch": "x64"
		}
	],
	"python": [
		{
			"file": "/app1/Pipfile.lock",
			"packages": [
				{
					"name": "Pkg1",
					"version": "7.2"
				},
				{
					"name": "Pkg2",
					"version": "1.2"
				}
			]
		},
		{
			"file": "/app2/Pipfile.lock",
			"packages": [
				{
					"name": "Pkg3",
					"version": "3.2"
				},
				{
					"name": "Pkg4",
					"version": "5.6"
				}
			]
		}
	],
	"ruby": [
		{
			"file": "/app1/Gemfile.lock",
			"packages": [
				{
					"name": "Pkg1",
					"version": "7.2"
				},
				{
					"name": "Pkg2",
					"version": "1.2"
				}
			]
		},
		{
			"file": "/app2/Gemfile.lock",
			"packages": [
				{
					"name": "Pkg1",
					"version": "7.2"
				},
				{
					"name": "Pkg2",
					"version": "1.2"
				}
			]
		}
	],
	"whiteout": [
		"/file1",
		"/file2"
	],
	"opaque": [
		"/dir1",
		"/dir2"
	]
}

Merging JSONs

Since a container image will have multiple layers, and each layer will have JSON file, we need to merge the JSON files into one JSON that represents the overall image files/packages.

The merge function should get an ordered list of JSON files (from layer#1 to layer#n) and merge them into one JSON file that can be used for the scanning.
While merging we should take into consideration cases like deleted files, deleted packages, and deleted directories.

Here are the merge rules:

  1. If multiple layers have a list of os-packages, take the list of packages only from the last layer. This will ensure we support cases where an os-package might have been deleted on higher layers.
  2. If multiple layers have programming language lock files – check that these files were not deleted by higher layers. If yes – remove the programming language packages of the deleted files from the merged JSON.
  3. Like Display advisory information #2, just for cases where a complete directory was deleted.
  4. If multiple layers have the same programming language file, take the programming language packages from the higher layer of the same file.

Metadata

Assignees

Labels

kind/deprecationCategorizes issue or PR as related to a feature/enhancement marked for deprecation.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions