-
-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype "Reproducible Build" using current available Adopt jenkins job & infra framework and "build info" #2594
Comments
The Jenkins build jobs are currently "generated" whenever openjdk-build or ci-jenkins-pipelines changes occur. The contents of these jobs also contain an extended amount of JSON configuration/settings, both the job parameters and the JSON settings are specific to the build setup at the time of the release/build.
|
So utilizing the Adopt Build job USER build scripts settings to point at a SHA of a level of the openjdk-build scripts you want to use (eg.January 19th CPU Update), is quite easy to do. Here is a build of January jdk-11.0.10+9 Hotspot Release: https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-linux-x64-hotspot/960/parameters/
|
Docker does have its limitations however, the use of dockerhub is one potential security exposure, with the reliance of storing/pulling images to build a secure JDK over the internet, not ideal. |
Rebuilding Jan CPU Update with the current Jenkins job:
DEFAULTS_JSON:
In the "defaultsUrl" location:
build_branch SHA is of the openjdk-build scripts commit used to build Jan CPU update. Note, DOCKER_IMAGE uses @sha256:56... SHA digest of docker image used to build. Didn't have the exact SHA digest, but this could be used if we did store it. |
diffoscope of the original Release jdk.tar.gz and the Reproduced one, shows many differences, which is not surprising really, as things like timestamps are different everywhere, also any filelists or zip contents(eg.jmods) have files in different orders.
Reproduced "lib":
The only two files differing in size are "modules" and "src.zip":
|
Some jmods Release vs Reproduce:
Minor size differences. |
There are several important aspects of "Reproducibility" that all work together:
|
Leading on from the "import aspects", we ought to clarify the "Requirements" again with those aspects in mind, and also reproducible-builds.org "How?": 1. "Re-build" a release, re-producing the "same output"
2. Security (access/auth) of the "infra" and all servers used for storing "dependencies"
3. Auditable "change management" on all "infra", "tooling" and "dependency servers"
|
Prototype setting up a new ubuntu 20.04 jdk17 hotspot build environment within a docker container:
As you can see installing "git" alone installs 56 new pkgs, do we need to record all those? Dependency scanning?We could install all the required dependencies, then simply "scan"/"list" ALL the pkgs and "record" those as "build info"? What does that achieve? Not sure it's reproducible, as not easy to re-build with that exact set of 100s of specific versions. |
An excellent project working on and discussing various approaches and issues with reproducible builds: https://wiki.yoctoproject.org/wiki/Reproducible_Builds |
From the evidence my thinking the difficulty in achieving "exact binary reproducibility" and keeping it there actually don't achieve enough benefit for the cost, the continual effort to fix upstream and other tooling products that might be introducing a single timestamp where you don't want it is simply not a useful use of effort. We ought to concentrate on the benefits that "reproducible builds" is aiming to achieve,eg.trusted supply chain. |
Since git is not directly used as part of the build itself (another git version will generally extract the same source, a different version of the build tools will not necessarily produce the same output) I'd tentatively say no (at least as a first pass) although recording all versions used for a build is probably useful. |
openjdk16 at a minimum is required for "reproducible" Java apps, as prior to that things like jmods/jars had random "hashes" |
strace -ff -e trace=openat cat /etc/vdpau_wrapper.cfg stap -e ‘probe syscall.open { printf (“UID %d: %s(%d) open (%s)\n”, uid(), execname(), pid(), argstr) }’ |
Tooling:
apt list -a :
apt list -a
apt list -a
apt list -a
apt list -a
apt list -a
apt list -a
apt list -a
apt list -a
apt list -a
apt list -a
|
Had a go at installing and running SystemTap, unsuccessfully, issues with environment and compiling it, looks like system debug image is required. Seems a bit too low level for something we would use as part of a build. |
strace is far easier and usable to use, however the output from running a jdk build is huge, as a trace file is created per process, and 1000s of processes are created to perform a JDK build. Here is a tiny section of the "grep'ing" for "include" output within a "delta" re-build of a JDK when changing just a single .c file:
It does illustrate the various System headers picked up from places like /usr/include, /usr/lib, /usr/local/include, ... |
The strace test illustrates the problem of knowing exactly what system files are used to compile with. It is not practical to log, verify and determine the version of every single opened .h file for example. |
@sxa i've done a smaller delta build with strace, adding execve to strace, which produce the executables nicely:
packagelist.txt:
filesnotinpackage.txt:
|
equivalent of strace for non-linux platforms is somewhat of an issue, Mac and Windows |
Working with jdk-18 that has had some reproducible build PRs included, with support for SOURCE_DATE_EPOCH, source date can be "fixed" using eg:
Testing on x64Linux ubuntu, two re-builds locally showed JDK image differences, in several files, including .jmods, and .zips. I have created a fork which resolves the issue caused by timestamps: This left just 1 file that is different which is the generated 64bit CDS archive file: server/lib/cds_classes_nocoops.jsa |
-Xlog:cds output fort differing CDS archives:
CDS-2:
|
As can be seen the CDS-2 content has changed differently between the first and 2nd -Xshare:dump. The JDK build builds 2 CDS archives cds_classes.jsa and cds_classes_nocoops.jsa, in that order. Ther 1st one cds_classes.jsa is always identical, the 2nd one cds_classes_nocoops.jsa is usually different:
|
The region addresses are the same for both CDS archives, which means the 2nd -Xshare:dump is probably being affected in a non-deterministic way by the first dump? |
Found existing JDK bug for this deterministic CDS dump problem, i've added a comment with some of my details: https://bugs.openjdk.java.net/browse/JDK-8253495?focusedCommentId=14450892&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14450892 |
Hi @sxa @andrew-m-leonard, I'm Outreachy applicant, opened this issue following project description (Secure Software Supply Chain Enhancements (Codename "RASPberry"). I'm a bit lost in making sense of this issue with regards to project. Can you help me in getting started? Thank you! |
@SehrishHussain Hi Sehrish, this issue covers quite a lot, for the CycloneDX SBOM initial task, I have just created a new sub-issue here: #2753 |
Two Jenkins builds of jdk-18 MacOS using my latest patches show only difference of CDS archive:
Building on my local laptop as a different environment test to see how dependent on EXACT dependencies and compiler levels we are:
|
Building on 2 separate x64 Linux environments seems very very difficult to build reproducibly, due to the "native" libraries mainly being different. eg.Build1=Jenkins-Centos6 docker environment, Build2=local virtualbox Ubuntu2.0.04 VM. Even building with exactly the same gcc 7.5.0. All the .so libraries differ considerably. |
This is a work item for EPIC #2522.
The aim being to "prototype" an initial attempt at being able to "recreate" an Adopt build based on specifying the basic "build info" currently available for an existing "release", and assuming the current available "infra" setup.
Design:
Assumptions for 1st prototype:
Spec:
Extend the current openjdk-pipeline & build jobs to take as extra inputs basic "build info":
The text was updated successfully, but these errors were encountered: