Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Machine requirement: Ansible AWX #3339

Closed
Tracked by #3292
sxa opened this issue Jan 10, 2024 · 13 comments
Closed
Tracked by #3292

New Machine requirement: Ansible AWX #3339

sxa opened this issue Jan 10, 2024 · 13 comments

Comments

@sxa
Copy link
Member

sxa commented Jan 10, 2024

I need to request a new machine:

  • New machine operating system (e.g. linux/windows/macos/solaris/aix): linux
  • New machine architecture (e.g. x64/aarch32/arm32/ppc64/ppc64le/sparc): n/a (Maybe ppc64le?)
  • Provider (leave blank if it does not matter):
  • Desired usage: replacement for the AWX sever which is hosted on Equinix at present
  • Any unusual specification/setup required:
  • How many of them are required: 1

Please explain what this machine is needed for:

@sxa
Copy link
Member Author

sxa commented Jan 15, 2024

Testing on an OSUOSL aarch64 machine - 2 core, 4GB, 30Gb with a 100GB volume attached for /Vendor_Files which can hopefully be mapped into the awx-task container - 140.211.169.54 (If not it can become /var/lib/docker)

@sxa
Copy link
Member Author

sxa commented Jan 15, 2024

Steps:

  1. apt-get update && apt-get upgrade because it's a good place to start. apt install joe because I used it (Not strictly required)
  2. Use 100GB to create a 5Gb swap partition and the remainder for /Vendor_Files. Add to /etc/fstab (/home/awx/Vendor_files?)
  3. apt install docker.io
  4. TO BE DONE: Follow insttructions on https://github.com/adoptium/infrastructure/wiki/Ansible-AWX - https://ansible.readthedocs.io/projects/awx-operator/en/latest/user-guide/advanced-configuration/custom-volume-and-volume-mount-options.html

@sxa
Copy link
Member Author

sxa commented Jan 16, 2024

Noting that as of version 18 (Our existing server is 15.0.1) the install process is not via docker but is intended to be done with the new AWX Operator which requires kubernetes - testing with minikube.

@sxa sxa self-assigned this Jan 23, 2024
@sxa
Copy link
Member Author

sxa commented Feb 13, 2024

Current status:

  • Switched to k3s on my own box for testing. Initial attempt failed because the process was pulling down x64 docker images (I suspect there are no aarch64 ones, so the production node is likely going to have to be x64)
  • Noting that kubectl describe pod ... was giving me better information than kubectl log initially
    Used qemu+docker to allow the containers to install with make deploy on the 2.10.0 version of the awx-operator repository (2.11.0 didn't seem to work) as per the AWX operator docs but this took several hours.
  • Once it completed conneting to port 443 as per the output of kubectl describe service was giving a 404 page not found error.
  • I still haven't identified a solution for mapping the /Vendor_Files into the container
  • Note that I'm running with KUBECONFIG=/etc/rancher/k3s/k3s.yaml in the environment but that file needs to be readable by non-root sine I'm running as an awx user - the permissions currently get reset on restart. EDIT: Based on information in this doc I have adjusted the startup command in /etc/sytemd/system/k3s.service to include --write-kubeconfig-mode=640 and done a on-off chgrp awx /etc/rancher/k3s/k3s.yaml

@sxa
Copy link
Member Author

sxa commented Feb 13, 2024

For Vendor Files it has been suggested that we use an execution environment (a feature included in AWX) which can be based on the https://github.com/ansible/awx-ee repository.

Alternatively we may be able to copy into the pod using kubectl cp. For checking what has been done you can use kubectl exec e.g. kubectl exec -n awx awx-adoptium-task-... -- df -k. Add --stdin --tty if you want to run an interactive process such as a shell.

Also as an alternative to k3s if it misbehaves I've been pointed at these instructions for kind which I did try briefly initially: https://gist.github.com/fosterseth/081c05248975048beb55858def010866 although that seems more geared towards development (Similar to running it on minikube)

@sxa
Copy link
Member Author

sxa commented Feb 14, 2024

Created Ubuntu 22.04/x64 machine adoptium-awx in Azure with IP 172.187.93.97 (default user azureuser for now).

@sxa
Copy link
Member Author

sxa commented Feb 14, 2024

Installation steps:

  • Run the installation script from https://get.k3s.io as root with --write-kubeconfig-mode 644 as a parameter
  • Install the make package
  • useradd -m awx
  • chgrp awx /etc/rancher/k3s/k3s.yaml
  • Edit /etc/systemd/system/k3s.service to add --write-kubeconfig-mode=640 to the startup command (Note it may be possible to set this during the install)
  • Add export KUBECONFIG=/etc/rancher/k3s/k3s.yaml to awx's .bashrc

As the awx user:

  • git checkout https://github.com/ansible/awx-operator -b 2.12.0 (Latest version at the time of writing)
  • cd awx-operator
  • make deploy 2>&1 | tee awx-deploy.log
  • Create kustomization.yaml as per the docs then run kubectl apply -k .
  • Create awx-adoptium.yaml (e.g. copy from awx-demo.yml), reference it from kustomization.yml in the resources section after the github line and run kubectl apply -k . again.
  • Run kubectl logs -f deployments/awx-operator-controller-manager -c awx-manager -n awx to check deployment status or kubectl -n awx get pods to see what has been created. For any problems, kubectl describe pod ... may be useful. It should complete in about 15 minutes.
  • Run kubectl get secret awx-adoptium-admin-password -n awx -o jsonpath='{.data.password}' | base64 --decode to get the password for the admin user, and kubectl -n awx get service to determine the port number fo the awx-adoptium-service or kubectl -n awx get all to get more information about the deployment.

@sxa
Copy link
Member Author

sxa commented Feb 14, 2024

Placeholder for HTTPS information. Note that we will need to ensure these are exposed on the public interface of the azure host:

The example at https://github.com/kurokobo/awx-on-k3s/blob/main/base/awx.yaml (referenced in that repo's README.md which points to this SSL guide shows the ingress section of the configuration file which is used to specify the hostname and certificate details. Full documentation is at https://ansible.readthedocs.io/projects/awx-operator/en/latest/user-guide/network-and-tls-configuration.html#ingress-type

ingress_type: ingress
  ingress_hosts:
    - hostname: awx.example.com
      tls_secret: awx-secret-tls

SSL now configured. References:

@sxa
Copy link
Member Author

sxa commented Feb 22, 2024

Next steps:

Some other references (may be useful to persist projects over restart):
  • Set up GitHub authentication

  • Configure bastillion to deploy new passphrase-enabled ssh key everywhere

  • Configure the rest of the tasks and automation

    • Ensure the ansible galaxy community stuff is available to avoid problems with homebrew and zypper in the curl and GIT_Source roles. Add a roles/requirements.yaml with the appropriate values to load:
    • Ensure that the "access" tab is set to everyone in the AdoptOpenJDK group (Note, the organisation name comes from the github team used for authentication, hence why it is not "Adoptium")
  • Ensure that Windows deployments using the Visual Studio layout from Vendor_Files is working.

  • Enable ssh logins for the infrastructure team

  • Add to the inventory file inventory: add Azure x64 awx.adoptium.net #3422

  • Stop using the old server

I've also set up a 1GB /swapfile and activated it as the server seemed to be struggling with the 4GB of RAM which it was configured with. Performance of the overall system isn't great but it may be throttling from being a burstable Azure system.

@sxa
Copy link
Member Author

sxa commented Feb 28, 2024

Size changed from B2s to D2s_v3 (Increase from 4 to 8GB RAM) and the performance is looking a lot better.

@sxa
Copy link
Member Author

sxa commented Feb 28, 2024

Note - I'm having to skip the Jenkins_User role for now in the Unix playbook as the correct information is not yet accessible from the Vendor_Files map.

@sxa
Copy link
Member Author

sxa commented Mar 7, 2024

I'm going to close this as the system is now up and running but will have a separate task to try and document this somewhere (Replacing the docs in the wiki I expect but maybe more widely) and we will also need to look at testing upgrades at some point so we can get round the non-scrolling log issue.

@sxa
Copy link
Member Author

sxa commented Apr 9, 2024

New 128Gb disk added in the Azure console, formatted one partition on it and mounted at /Vendor_Files.
Contents copied across from the old AWX server: ssh 147.75.100.121 docker exec awx_task tar cf - -C / Vendor_Files | tar xpf -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

1 participant