The three github teams relevant to this repository are as follows (Note, you won't necessarily have access to see these links):
- adoptopenjdk-infrastructure - write access to the repository which lets you be an official approver of PRs (triage doesn't)
- infrastructure - higher level of access for system administrators only
- admin_infrastructure - The Admin team - can force through changes without approval etc.
All changes to the repository should be made via GitHub pull requests.
Reviews are required for PRs in this repository. In very special circumstances such as a clear breakage that has a simple fix available then a repository admin may override that requirement to push through a change if no reviewers are available, but in such cases a comment explaining why must be added to the Pull Request.
The full documentation for running locally is at [ansible/README.md].
Assuming you have ansible installed on your UNIX-based machine, clone this
repository, create an inventory
text file with the word localhost
and run this from the ansible
directory:
ansible-playbook -b -i inventory_file --skip-tags adoptopenjdk,jenkins_user playbooks/AdoptOpenJDK_Unix_Playbook/main.yml
NOTE: For windows machines you cannot use this method (i.e., as localhost) as ansible does not run natively on Windows
On an Ansible Control Node create an inventory file with the list of machines you want to set up, then
from the ansible
directory in this repository run something like this:
ansible-playbook -b -i inventory_file --skip-tags adoptopenjdk,jenkins_user playbooks/AdoptOpenJDK_Unix_Playbook/main.yml
If you don't have ssh logins enabled as root, add -b -u myusername
to the
command line which will ssh into the target machine as myusername
and use
sudo
to gain root access to do the work.
To do this you ideally need to be using key-based ssh logins. If you use a passphrase on your ssh key use the following to hold the credentials in the shell:
eval `` `ssh-agent` ``
ssh-add
and if using the -b
option, ensure that your user has access to sudo
without a password to the root
account (often done by adding it to the wheel
group)
In addition to the static build machines which we have, there are also Dockerfiles that are used to build the base images that our build farm uses for running docker based builds on some of our platforms - this is what we have at the moment:
Dockerfile | Image | Platforms | Where is this built? | In use? |
---|---|---|---|---|
Centos7 | adoptopenjdk/centos7_build_image |
linux on x64, arm64, ppc64le, armv7l | Jenkins | Yes |
Centos6 | adoptopenjdk/centos6_build_image |
linux/amd64 | GH Actions | Yes |
Alpine3 | adoptopenjdk/alpine3_build_image |
linux/amd64 | GH Actions | Yes |
Windows2016_Base | adoptopenjdk/windows2016_build_image:base |
windows/amd64 | GH Actions | No |
Windows2016_VS2017 | adoptopenjdk/windows2016_build_image:vs2017 |
windows/amd64 | GH Actions | No |
When a change lands into master, the relevant dockerfiles are built using
the appropriate CI system listed in the table above by configuring them with
the ansible playbooks and pushing them up to Docker Hub where they can be
consumed by our jenkins build agents when the DOCKER_IMAGE
value is
defined on the jenkins build pipelines as configured in the
pipeline_config files.
Other than the dependencies on the machines which come from packages shipped
with the operating system, we generally use individual roles for each piece
of software which we install on the machines. For the main Unix and Windows
playbooks each role has it's own directory and is called from the top level
main.yml
playbook. They are fairly easy to add and in most cases you can
look at an existing one and copy it.
As far as possibly, give each operation within the role a tag so that it can either be skipped if someone doesn't want it, or run on its own if desired.
If something is specific to the adoptopenjdk infrastructure (e.g. setting
host names, or configuring things specific to our setup but aren't required
to be able to run build/test operations) then give the entries in that role
an adoptopenjdk
tag as well. If you need to do something potentially
adjusting the users' system, use the dont_remove_system
tag. This is
occasionally required if, for example, we need a specific version of a tool
on the machine that is later than the default, and want to ensure that the
old version does not get invoked by default on the adoptopenjdk machines.
See
GIT_Source
as an example
The build triage team will frequently raise issues if they determine that a build failure is occurring on a particular system. Assuming it's not a "system is offline" issue you may wish to repliacte the build process locally. The easiest way to do this is as follows (ideally not as root as that can mask problems).
git clone https://github.com/adoptopenjdk/openjdk-build
cd openjdk-build/build-farm
export CONFIGURE_ARGS=--with-native-debug-symbols=none
export BUILD_ARGS="--custom-cacerts false"
./make-adopt-build-farm.sh jdk11u
(NOTE: jdk11u
is the default if nothing is specified)
The two export
lines are based on the options in the
Build FAQ
and speed up the process by not building the debug
symbols and not generating our own certificate bundles. For most problems,
neither are needed Look at the start of the script for other environment
variables that can be set control what is built - for example VARIANT
can
be set to openj9
and others instead of the default of hotspot
. The
script uses the appropriate environment configuration files under
build-form/platform-specific-configurations
to set some options.
Many infrastructure issues (generally
those tagged as testFail are raised
as the result of failing JDK tests which are believed to be problems
relating to the set up of our machines. In most cases it is useful to
re-run jobs using the jenkins
Grinder
jobs which lets you run almost any test on any machine which is connected to
jenkins. In most cases testFail
issues will have a link to the jenkins
job where the failure occurred. On that job there will be a "Rerun in
Grinder" link if you need to re-run the whole job (which will run lots of
tests and may take a while) or within the job you will find individual
Grinder re-run links for different test subsets. When you click them, you
can set the LABEL
to the name of the machine you want to run on if you
want to try and replicate it, or determine which machines it passes and
fails on.
For more information on test case diagnosis, there is a full Triage guide in the openjdk-tests repository
The values for TARGET
can be found in thte <testCaseName>
elements of
.the various playlist.xml
files in the test repositories. It can also be
jdk_custom
which case you should set the CUSTOM_TARGET
to the name of
an individual test for example:
test/jdk/java/lang/invoke/lambda/LambdaFileEncodingSerialization.java
If you then need to run manually on the machine itself (outside jenkins) then the process is typically like this:
git clone https://github.com/adoptopenjdk/openjdk-tests && cd openjdk-tests
./get.sh && cd TKG
export TEST_JDK_HOME=<path to JDK which you want to use for the tests>
BUILD_LIST=openjdk make compile
make <target>
BUILD_LIST
depends on the suite you want to run, and can be omitted to build
the tests for everything, but that make take a while and requires docker
to be available. Note that when building the system
suite, there must be
a java in the path to build the mauve tests. The final make command runs
the test - it is normally a valid Grinder TARGET
such as jdk_net
. There
is more information on running tests yourself in the
tests repository
A few examples that test specific pieces of infra-related functionality so useful to be aware of:
BUILD_LIST=functional
,CUSTOM_TARGET=_MBCS_Tests_pref_ja_JP_linux_0
BUILD_LIST=system
,CUSTOM_TARGET=_MachineInfo
BUILD_LIST=openjdk
,CUSTOM_TARGET=test/jdk/java/lang/invoke/lambda/LambdaFileEncodingSerialization.java
(en_US.utf8
locale required)BUILD_LIST=system
,TARGET=system.custom
CUSTOM_TARGET=-test=MixedLoadTest -test-args="timeLimit=5m"
(system_custom
was added in adoptium/aqa-tests#2234)
(For the last one, that makes use of the system.custom target added via this PR)
If you are making a change which might have a negative effect on the
playbooks on other platforms, be sure to run it through the
VagrantPlaybookCheck
job first. This job takes a branch from a fork of the
openjdk-infrastructure
repository as a parameter and runs the playbooks
against a variety of Operating Systems using Vagrant and the scripts in
pbTestScripts
to validate them.
The AdoptOpenJDK Jenkins server at https://ci.adoptopenjdk.net is used for all the
builds and testing automation. Since we're as open as possible, general read
access is enabled. For others, access is controlled via github teams (via
the Jenkins Github Authentication Plugin
as follows. (Links here won't work for
most people as the teams are restricted access)
- release can run and configure jobs and views
- build has the access for
release
plus the ability to create new jobs - testing has the same access as
build
- infrastructure has the same as
build
/testing
plus can manage agent machines - jenkins-admins as you might expect has access to Administer anything
Some jobs within jenkins, such as the
build signing job
and Release tool job
are restricted further via the Enable project-based security
section of
the job definition. In the case of those two in particular it's jenkins-admins
and
release
teams only who have access to them respectively.
To add a new system:
- Ensure there is an issue documenting its creation somewhere (Can just be an existing issue that you add the hostname too so it can be found later
- Obtain system from the appropriate infrastructure provider
- Add it to bastillion (requires extra privileges) so that all of the appropriate admin keys are deployed to the system (Can be delayed for expediency by putting AWX key into
~root/.ssh/authorized_keys
) - Create a PR to add the machine to inventory.yml (See NOTE at end of the list)
- Once merged, run the ansible scripts on it - ideally via AWX (Ensure the project and inventory sources are refreshed, then run the appropriate
Deploy **** playbook
template with aLIMIT
of the new machine name) - Add it to jenkins, verify a typical job runs on it if you can and add the appropriate tags
NOTE ref inventory: If you are adding a new type of machine (build
, perf
etc.) you should also add it to
adoptopenjdk_yaml.py
and, if it will be configured via the standard playbooks, add the new type to the
list at the top of the main playbook files for
*IX and
windows
The DockerStatic role was first implemented in this PR and extended through this issue and is intended to allow us to make more efficient use of larger machines. The role uses a set of Dockerfiles, one per distribution, which can be used to generate a set of docker machines that are started, exposed on an ssh port, and connected to jenkins. They only contain the minimum required to run test jobs and cannot be used for builds. These containers serve several purposes:
- We have some quite large machines, and this lets us split them up without full virtualisation
- It allows us to increase the number of distributions we test on
- We can run multiple tests in parallel on the host with isolation not available when multiple executors are used
The DockerStatic role sets up several containers each with a different distribution and exposed on a specific port. It also (at the time of writing) sets them up to be restricted to two cores and 6Gb of RAM which is adequate for most tests. On larger machines it may be appropriate to modify those values, and we should look at whether we can reasonably autodetect or parameterize those values. Potentially we could also scale up and create more than one of each OS on a given host. To set up a host for running these you can use the dockerhost.yml playbook (Also available via AWX).
Once the static docker containers have been created they are connected into
jenkins with a test-docker-
prefix. Ideally the description of the machine
in jenkins should list which host it's on, but you can also look up the
IP address of the test-docker-
system in the inventory.yml file to find
out the host. The inventory file does NOT contain any of the test-docker-
systems as they do not have the playbooks run against them. Normally
machines used for this purpose will be prefixed test-dockerhost-
to
identify them and split them out so they do not have the full playbooks
executed against them in order to keep the host system "clean". In some
cases they may be used as dockerBuild
hosts too.
Instructions on how to create a static docker container can be found here
- Set up patching cycle
- Identify ways to redeploy when needed to pick up updates
- Allow dockerhost.yml playbook to adjust core file settings
- Add mechanism to deploy differently based on host machine size
In some occasions non-infrastruture team members may wish to access a machine in order to reporoduce a test case failure, particularly if they do not have access to a machine of any given platform, or if the problem appears to be specific to a particular machine or cloud provider. In this case, the following procedure should be followed. Example commands are suitable for most UNIX-based platforms:
- User should raise a request for access using this template (in general, "Non-privilieged" is the correct option to choose
- Infrastructure team member doing the following steps should assign the issue to themselves
- For non-privilieged users, create an account with a GECOS field referencing the requester and issue number e.g.
useradd -m -c "Stewart Addison 1234" sxa
- Add the user's key to
.ssh/authorized_keys
on the machine with the user's public ssh key in it - Add a comment to the issue with the username and IP address details
- The issue should be left open until the user is finished with the machine (if it has been a while, ask them in the issue)
- Once user is finished, remove the ID (
userdel -r username
) - Close the issue