Change the caching mechanism to be based on JSON files #395
Description
Overview
We would like to create a JSON file for every container image layer.
The JSON file will represent the contents of an image layer, allowing Trivy to store this information for rescans and for cache purposes. This will replace the current method, where we save the entire image layer, or saving specific files of the image layer. The JSON based approach requires less disk space, and is also more useful for client-server architecture, where you want to do the caching on the server-side.
JSON Schema
All schema elements are optional. The only mandatory filed is “json-version”.
{
"json-version": "1.0",
"digest": "234982938492384",
"os-name": "RHEL",
"os-version": "7.2",
"os-packages": [
{
"name": "Pkg1",
"version": "7.2",
"release": "7",
"epoch": 1,
"arch": "x64"
},
{
"name": "Pkg2",
"version": "1.2",
"release": "3",
"epoch": 0,
"arch": "x64"
}
],
"python": [
{
"file": "/app1/Pipfile.lock",
"packages": [
{
"name": "Pkg1",
"version": "7.2"
},
{
"name": "Pkg2",
"version": "1.2"
}
]
},
{
"file": "/app2/Pipfile.lock",
"packages": [
{
"name": "Pkg3",
"version": "3.2"
},
{
"name": "Pkg4",
"version": "5.6"
}
]
}
],
"ruby": [
{
"file": "/app1/Gemfile.lock",
"packages": [
{
"name": "Pkg1",
"version": "7.2"
},
{
"name": "Pkg2",
"version": "1.2"
}
]
},
{
"file": "/app2/Gemfile.lock",
"packages": [
{
"name": "Pkg1",
"version": "7.2"
},
{
"name": "Pkg2",
"version": "1.2"
}
]
}
],
"whiteout": [
"/file1",
"/file2"
],
"opaque": [
"/dir1",
"/dir2"
]
}
Merging JSONs
Since a container image will have multiple layers, and each layer will have JSON file, we need to merge the JSON files into one JSON that represents the overall image files/packages.
The merge function should get an ordered list of JSON files (from layer#1 to layer#n) and merge them into one JSON file that can be used for the scanning.
While merging we should take into consideration cases like deleted files, deleted packages, and deleted directories.
Here are the merge rules:
- If multiple layers have a list of os-packages, take the list of packages only from the last layer. This will ensure we support cases where an os-package might have been deleted on higher layers.
- If multiple layers have programming language lock files – check that these files were not deleted by higher layers. If yes – remove the programming language packages of the deleted files from the merged JSON.
- Like Display advisory information #2, just for cases where a complete directory was deleted.
- If multiple layers have the same programming language file, take the programming language packages from the higher layer of the same file.