Skip to content

Commit

Permalink
Preparing 0.8.6 (#26)
Browse files Browse the repository at this point in the history
* Updated Version to 0.8.5.

* Formatting.

* Fix Serve long running test (ray-project#8223)

* Fix release 0.8.5 tests for PPO torch Breakout. (ray-project#8226)

* Remove logging (ray-project#8211)

* [BRING BACK TO MASTER] Fix cluster.yaml config.

* [rllib] Copy plasma memory before adding data to replay buffer

* [sgd] Resource limit lift for GPU test (ray-project#8238)

* Fix resource_ids_ data race (ray-project#8253)

* [rllib] [hotfix] Remove assert that trips on pytorch multiagent (ray-project#8241)

* [BRING BACK TO MASTER] add torch download for rllib regresstion test.

* [serve] Master actor fault tolerance (ray-project#8116)

* [serve] Add delete_backend call (ray-project#8252)

* Fix resource_ids_ data race (ray-project#8253)

* [serve] Add delete_endpoint call (ray-project#8256)

* [serve] Refactor BackendConfig (ray-project#8202)

* Delete example files.

* Fix serve long running test (ray-project#8268)

* [tune] Avoid breakage - soft deprecation warning for search algs (ray-project#8258)

* [tune] Hotfix Ax breakage when fixing backwards-compat (ray-project#8285)

* Async actor microbenchmark Script (ray-project#8275)

* [core] Disable GCS actor management (ray-project#8271)

* Pin redis-py version (ray-project#8290)

* [BRING BACK TO MASTER] add pip install upgrade to the command.

* Add ipython as dependency for autoscaler container (ray-project#8297)

Co-authored-by: rbusche <rbusche@inserve.de>

* Revert "Async actor microbenchmark Script (ray-project#8275)"

This reverts commit 6a6eead.

* Docs and LINT.

* [RLlib] Increasing reusability v0 (#8)

* Set up CI with Azure Pipelines

Specifically, we are setting a
travis like ADO pipeline following
what is already present in the .travis.yml
file in the root of the repo.

* Separating travis like pipeline from main pipeline

* Adding Jenkings jobs equivalent

* Making some improvements

* Adding validation of the upstream CI

* Disabling Tune and large memory tests

* Changing threshold for simple reservoir sampling test

* Addressing comments

* Updating Azure Pipelines with travis updates

* Updating Azure Pipelines with more travis updates

* Updating CI with new cpp worker tests

* Setting code owners

* Fixing the version number generation

* Making main pipeline also our release pipeline

* Updating Azure Pipelines with travis updates

* Fixing wheels test

* Fixing codeowners

* Updating Azure Pipelines with travis updates

* Bumping up MACOSX_DEPLOYMENT_TARGET

* Updating Azure Pipelines with travis updates

* Updating Azure Pipelines with travis updates

* Updating Azure Pipelines with travis updates

* Disabling Serve tests

* Making explicit which branches GitHubActions workflows should watch

* Desabling Ray serve tests

* Installing numpy explicitly

* consolidating Ray test steps in one yml

* Making worker set, apex and ppo a little bit more reusable for custom agents

* Making Dynamic TF policy more reusable

* Allow the actions dict carry user data defined for the episodes

* Forcing RLlib tests to run always

* Making SAC model more extensible

* Adapting exploration API

* Reverting the random worker index change

* Making epsilon configurable

* Fixing method doc

* Fixing aguments check in reset_schedule

* Fixing per worker epsilon greedy

* Activating logs for failing test

* Making original_space check more roboust

* Allow normalized actions rescaling happend outside RLlib

* Passing infos values from agents to callbacks

* Installing node js using a task

* Adding kwargs in TFModels

* Fixing npm and node in mac

* Fixing the num workers value passed

* Forcing RLlib tests

* Merging 0.8.5

* Running some RLlib test in custom agent

* Adding echo bazelisk

* Force CI

* Force CI

* Relaxing an installation

* Using container jobs

* Fixing container jobs

* Change base image for container job

* Install with sude

* Exec with sudo

* Test

* Changing agent pool

* Remove python selection

* Fix version replacement

* Fix version replacement

* Trying Bazel

* Installing node with sudo

* Run all install as sudo

* Reverting sudo -s

* Fixing omitted param

* install python manually

* Fixing missing param

* Making NVM available

* Fix nvm installation

* Fix copye-paste

* renaming to req file

* fix typo

* Install JDK 8

* Install req in other jobs

* Install JDK with sudo

* Removing docker clean up

* Install Docker

* fix installation issue

* Adding azure package source

* Fix docker permissions

* Install jq

* downloading with sudo

* Install llvm as root

* Skiping flaky test

* copy artifacts as sudo

* Fix Bazel build in MacOS (#23)

* Fixing mac os building issue

* Bazelisk check

* Increase bazel version

* Fixing typos

* Update hash

* Include unzip

* Improved compilation and convergence tests

Added compilation tests that follow proper PyTest conventions.
These tests use parametrized settings, and allow for multiple algorithms to be
tested with a single test.
I've commented out tests these two tests can replace, to show the improvement.
Only about half of the algorithms have been transitioned to the new tests in
interest of keeping the PR small.

* Increasing bazel version

* Increasing bazel version only mac pipelines

* Printing system info in Ubuntu wheels pipeline

* making docker install optional

* Compilation and convergence tests for more algos

Added compilation and convergence tests for Apex DQN, Apex DDPG
Added convergence tests for SAC
Removed old (commented out) compilation test code from
`rllib.agents.dqn.tests.test_apex`

* Clean up

Deleted old (commented out) test code

* Updated BUILD file

Split tests into test_compilation and test_learning.py to work with BAZEL build files.

* Updated BUILD file

Fixed bug in BUILD - wrong files passed in.

* BugFix: Improper imports causing test failures

* BugFix: Improper imports causing test failures

* Removed test_appo from BUILD file

* Fixing copy-paste error

* Applying some bazel fixes

* Fixing installation issues

* Update hash

* Fixing NVM/NODE installation

* Applying latest changes in travis.yml

* Fixing fixture data exclusions

* Disable some java tests

* Adgudime/apex sac (#25)

* WIP: Compilation tests work

* Fixed bugs with Apex SAC continuous action spaces

* Bugfix: Bad imports

* Fixing PyArrow issue

* Fixing guava check

* Fix datetime java format

* Fixing Bazel issues finding or loading conftest

* Fixing pytest module loading order

* Trying different approach to pickle check

* Installing latest pickle5 explicitly

* Fixing conftest resolution

* Temporarily disabling pickle5 validation

* Fixing fixture data exclusions

* Fixing data files treated as src

* Disable some java tests

Co-authored-by: Edilmo Palencia <edilmo@gmail.com>

* Fix multiple CI errors

* Update hash

* Fixing more build issues

* Fixing more build issues

* Fix pipeline cache path

* More fixes

* Fix cache

* Fixing bazel test command

* Fix bazel test

* Allowing custom sumarize episodes

* Adding custom metrics ops in exec plan

* Apex SAC exploration should be stochastic

* Leting DQN deal with rechaping for Discrete spaces

* Commenting the cache

Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Simon Mo <xmo@berkeley.edu>
Co-authored-by: Sven Mika <sven@anyscale.io>
Co-authored-by: ijrsvt <ian.rodney@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
Co-authored-by: Rüdiger Busche <rbusche@posteo.net>
Co-authored-by: rbusche <rbusche@inserve.de>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
Co-authored-by: Aditya Gudimella <aditya.gudimella@gmail.com>
  • Loading branch information
13 people authored Aug 14, 2020
1 parent 051fdd8 commit 97782a6
Show file tree
Hide file tree
Showing 108 changed files with 3,911 additions and 528 deletions.
2 changes: 1 addition & 1 deletion .bazelrc
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ test:asan --action_env=LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libasan.so.2
# For example, for Ubuntu 18.04 libasan can be found here:
# test:asan --action_env=LD_PRELOAD=/usr/lib/gcc/x86_64-linux-gnu/7/libasan.so

test:ci --flaky_test_attempts=3
test:ci --flaky_test_attempts=5
test:ci --nocache_test_results
test:ci --progress_report_interval=100
test:ci --show_progress_rate_limit=100
Expand Down
1 change: 1 addition & 0 deletions .bazelversion
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.3.0
37 changes: 25 additions & 12 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,38 +1,51 @@
# Each line is a file pattern followed by one or more owners.
# See https://help.github.com/articles/about-codeowners/
# for more info about CODEOWNERS file

# It uses the same pattern rule for gitignore file,
# see https://git-scm.com/docs/gitignore#_pattern_format.

# ==== Ray default ====
# These owners will be the default owners for everything in
# the repo. Unless a later match takes precedence,
# @BonsaiAI/ray-code-owners will be requested for
# review when someone opens a pull request.
* @BonsaiAI/ray-code-owners


# ==== Ray core ====

# All C++ code.
/src/ray @ray-project/ray-core-cpp
/src/ray @BonsaiAI/ray-maintainers

# Python worker.
/python/ray/ @ray-project/ray-core-python
!/python/ray/tune/ @ray-project/ray-core-python
!/python/ray/rllib/ @ray-project/ray-core-python
/python/ray/ @BonsaiAI/ray-maintainers
!/python/ray/tune/ @BonsaiAI/ray-maintainers
!/python/ray/rllib/ @BonsaiAI/ray-maintainers

# Java worker.
/java/ @ray-project/ray-core-java
/java/ @BonsaiAI/ray-maintainers

# Kube Operator.
/deploy/ @BonsaiAI/ray-maintainers

# ==== Libraries and frameworks ====

# Ray tune.
/python/ray/tune/ @ray-project/ray-tune
/python/ray/tune/ @BonsaiAI/ray-code-owners

# RLlib.
/python/ray/rllib/ @ray-project/rllib
/python/ray/rllib/ @BonsaiAI/ray-code-owners
/rllib/ @BonsaiAI/ray-code-owners

# ==== Build and CI ====

# Bazel.
/BUILD.bazel @ray-project/ray-core
/WORKSPACE @ray-project/ray-core
/bazel/ @ray-project/ray-core
/BUILD.bazel @BonsaiAI/ray-code-owners
/WORKSPACE @BonsaiAI/ray-code-owners
/bazel/ @BonsaiAI/ray-code-owners

# CI scripts.
/.travis.yml @ray-project/ray-core
/ci/travis/ @ray-project/ray-core
/.travis.yml @BonsaiAI/ray-maintainers
/ci/ @BonsaiAI/ray-maintainers

10 changes: 9 additions & 1 deletion .github/workflows/jenkins.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,15 @@ name: Jenkins Integration
env:
DEBIAN_FRONTEND: noninteractive

on: [push, pull_request]
on:
push:
branches:
- master
- releases/*
pull_request:
branches:
- master
- releases/*

jobs:
jenkins:
Expand Down
10 changes: 9 additions & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
name: CI

on: [push, pull_request]
on:
push:
branches:
- master
- releases/*
pull_request:
branches:
- master
- releases/*

env:
# Git GITHUB_... variables are useful for translating Travis environment variables
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -179,3 +179,6 @@ tools/prometheus*
# ray project files
project-id
.mypy_cache/

# PyCharm
.ijwb/
4 changes: 2 additions & 2 deletions bazel/ray_deps_setup.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,8 @@ def ray_deps_setup():

auto_http_archive(
name = "bazel_common",
url = "https://github.com/google/bazel-common/archive/084aadd3b854cad5d5e754a7e7d958ac531e6801.tar.gz",
sha256 = "a6e372118bc961b182a3a86344c0385b6b509882929c6b12dc03bb5084c775d5",
url = "https://github.com/google/bazel-common/archive/bf87eb1a4ddbfc95e215b0897f3edc89b2254a1a.tar.gz",
sha256 = "dab4cbd634aae4bc9b116f4de5737e4d3c0754c3a1d712ad4a9b75140d278614",
)

auto_http_archive(
Expand Down
225 changes: 225 additions & 0 deletions ci/azure_pipelines/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
# Azure Pipelines

This folder contains the code required to create the Azure Pipelines for the CI/CD of the Ray project.
Keep in mind that this could be outdated.
Please check the following links if you want to update the procedure.
- [Azure virtual machine scale set agents](https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/scale-set-agents?view=azure-devops)
- [Repo for the Azure Pipelines images](https://github.com/actions/virtual-environments)

## Self-hosted Linux Agents

### Create VM Image

The following are the instructions to build the VM image of a self-hosted linux agent using a Virtual Hard Drive (VHD).
The image will be the same one that is used by the Microsoft-hosted linux agents. This approach
simplifies the maintenance and also allows to keep the pipelines code compatible with both
types of agents.

Requirements:
- Install packer : https://www.packer.io/downloads.html
- Install azure-cli : https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest

Steps for Mac and Ubuntu:
- Clone the GitHub Actions virtual environments repo: `git clone https://github.com/actions/virtual-environments.git`
- Move into the folder of the repo cloned aboved: `pushd virtual-environments/images/linux`
- Log in your azure account: `az login`
- Set your Azure subscription id and tenant id:
- Check your subscriptions: `az account list --output table`
- Set your default (replace your Subscription id in the command): `az account set -s {Subscription Id}`
- Get the subscription id: `SUBSCRIPTION_ID=$(az account show --query 'id' --output tsv)`
- Get the tenant id: `TENANT_ID=$(az account show --query 'tenantId' --output tsv)`
- Select the azure location: `AZURE_LOCATION="eastus"`
- Create and select the name of the resource group where the Azure resources will be created:
- Set the group: `RESOURCE_GROUP_NAME="RayADOAgents"`
- Try to create the group. If the resource group exists, the details for it will be returned: `az group create -n $RESOURCE_GROUP_NAME -l $AZURE_LOCATION`
- Create a Storage Account:
- Set Storage Account name: `STORAGE_ACCOUNT_NAME="rayadoagentsimage"`
- Create the Storage Account: `az storage account create -n $STORAGE_ACCOUNT_NAME -g $RESOURCE_GROUP_NAME -l $AZURE_LOCATION --sku "Standard_LRS"`
- Create a Service Principal. If you have an existing Service Principal, it can also be used instead of creating a new one:
- Set the object id: `OBJECT_ID="http://rayadoagents"`
- Create client and get secret: `CLIENT_SECRET=$(az ad sp create-for-rbac -n $OBJECT_ID --scopes="/subscriptions/${SUBSCRIPTION_ID}" --query 'password' -o tsv)`. If the Principal already exist, this command returns the id of the role assignment. Please use your old password. Or delete the existing Principal with `az ad sp delete --id $OBJECT_ID`.
- Get client id: `CLIENT_ID=$(az ad sp show --id $OBJECT_ID --query 'appId' -o tsv)`
- Set Install password: `INSTALL_PASSWORD="$CLIENT_SECRET"`
- Create a Key Vault. If you have an existing Service Principal, it can also be used instead of creating a new one:
- Set Key Vault name: `KEY_VAULT_NAME="ray-agent-secrets"`
- Create the Key Vault: `az keyvault create --name $KEY_VAULT_NAME --resource-group $RESOURCE_GROUP_NAME --location $AZURE_LOCATION`. If the Key Vault exist, this command returns the info.
- Set a GitHub Personal Access Token with rights to download:
- Set Key Pair name: `GITHUB_FEED_TOKEN_NAME="raygithubfeedtoken"`
- Upload your PAT to the vault (replace your token in the command):`az keyvault secret set --name $GITHUB_FEED_TOKEN_NAME --vault-name $KEY_VAULT_NAME --value "{GitHub Token}"`
- Get PAT from the Vault: `GITHUB_FEED_TOKEN=$(az keyvault secret show --name $GITHUB_FEED_TOKEN_NAME --vault-name $KEY_VAULT_NAME --query 'value' --output tsv)`
- Create the Managed Disk image:
- Create a packer variables file:
```
cat << EOF > azure-variables.json
{
"client_id": "${CLIENT_ID}",
"client_secret": "${CLIENT_SECRET}",
"subscription_id": "${SUBSCRIPTION_ID}",
"tenant_id": "${TENANT_ID}",
"object_id": "${OBJECT_ID}",
"location": "${AZURE_LOCATION}",
"resource_group": "${RESOURCE_GROUP_NAME}",
"storage_account": "${STORAGE_ACCOUNT_NAME}",
"install_password": "${INSTALL_PASSWORD}",
"github_feed_token": "${GITHUB_FEED_TOKEN}"
}
EOF
```
- Execute packer build: `packer build -var-file=azure-variables.json ubuntu1604.json`
For more details (Check the following doc in the virtual environment repo)[https://github.com/actions/virtual-environments/blob/master/help/CreateImageAndAzureResources.md].
### Create Agent Pool
#### 1. Create the Virtual Machine Scale Set (VMSS)
Creation of the VMSS is done using the Azure Resource Manager (ARM) template, `image/agentpool.json`. The following are important fixed parameters that could be changed:
| Parameter | Description |
| ------------- | ------------- |
| vmssName | name of the VMSS to be created |
| instanceCount | number of VMs to create in initial deployemnt (can be changed later) |
Steps for Mac and Ubuntu:
- Log in your azure account: `az login`
- Set your Azure subscription id and tenant id:
- Check your subscriptions: `az account list --output table`
- Set your default: `az account set -s {Subscription Id}`
- Get the subscription id: `SUBSCRIPTION_ID=$(az account show --query 'id' --output tsv)`
- Get the tenant id: `TENANT_ID=$(az account show --query 'tenantId' --output tsv)`
- Set Storage Account name (same that is above): `STORAGE_ACCOUNT_NAME="rayadoagentsimage"`
- Select the azure location: `AZURE_LOCATION="eastus"`
- Create and select the name of the resource group where the Azure resources will be created:
- Set the group: `RESOURCE_GROUP_NAME="RayADOAgents"`
- Try to create the group. If the resource group exists, the details for it will be returned: `az group create -n $RESOURCE_GROUP_NAME -l $AZURE_LOCATION`
- Create a Key Vault. If you have an existing Service Principal, it can also be used instead of creating a new one:
- Set Key Vault name: `KEY_VAULT_NAME="ray-agent-secrets"`
- Create the Key Vault: `az keyvault create --name $KEY_VAULT_NAME --resource-group $RESOURCE_GROUP_NAME --location $AZURE_LOCATION`. If the Key Vault exist, this command returns the info.
- Create a Key Pair in the Vault:
- Set Key Pair name: `SSH_KEY_PAIR_NAME="rayagentadminrsa"`
- Set Key Pair name: `SSH_KEY_PAIR_NAME_PUB="${SSH_KEY_PAIR_NAME}pub"`
- Set SSH key pair file path: `SSH_KEY_PAIR_PATH="$HOME/.ssh/$SSH_KEY_PAIR_NAME"`
- Create the SSH key pair: `ssh-keygen -m PEM -t rsa -b 4096 -f $SSH_KEY_PAIR_PATH`
- Upload your key pair to the vault:
- Public part to be used by the VMs: `az keyvault secret set --name $SSH_KEY_PAIR_NAME_PUB --vault-name $KEY_VAULT_NAME --file ${SSH_KEY_PAIR_PATH}.pub`
- (Optional) Private part to be used by the VMs: `az keyvault secret set --name $SSH_KEY_PAIR_NAME --vault-name $KEY_VAULT_NAME --file $SSH_KEY_PAIR_PATH`
- Get public part from the Vault: `SSH_KEY_PUB=$(az keyvault secret show --name $SSH_KEY_PAIR_NAME_PUB --vault-name $KEY_VAULT_NAME --query 'value' --output tsv)`
- Create the VMSS:
- Set the Subnet Id of the subnet where the VMs must be: `SUBNET_ID="{Subnet Id}"`
- Set the VMSS name: `VMSS_NAME="RayPipelineAgentPoolStandardF16sv2"`
- Set the instance count: `INSTANCE_COUNT="2"`
- Get Reader role definition: `ROLE_DEFINITION_ID=$(az role definition list --subscription $SUBSCRIPTION_ID --query "([?roleName=='Reader'].id)[0]" --output tsv)`
- Set the source image VHD NAME (assuming the latest): `SOURCE_IMAGE_VHD_NAME="$(az storage blob list --subscription $SUBSCRIPTION_ID --account-name $STORAGE_ACCOUNT_NAME -c images --prefix pkr --query 'sort_by([], &properties.creationTime)[-1].name' --output tsv)"`
- Set the source image VHD URI: `SOURCE_IMAGE_VHD_URI="https://${STORAGE_ACCOUNT_NAME}.blob.core.windows.net/images/${SOURCE_IMAGE_VHD_NAME}"`
- Create the VM scale set: `az group deployment create --resource-group $RESOURCE_GROUP_NAME --template-file image/agentpool.json --parameters "vmssName=$VMSS_NAME" --parameters "instanceCount=$INSTANCE_COUNT" --parameters "sourceImageVhdUri=$SOURCE_IMAGE_VHD_URI" --parameters "sshPublicKey=$SSH_KEY_PUB" --parameters "location=$AZURE_LOCATION" --parameters "subnetId=$SUBNET_ID" --parameters "keyVaultName=$KEY_VAULT_NAME" --parameters "tenantId=$TENANT_ID" --parameters "roleDefinitionId=$ROLE_DEFINITION_ID" --name $VMSS_NAME`
#### 2. Create the Agent Pool in Azure DevOps
Open Azure DevOps > "Project Settings" (bottom right) > "Agent Pools" > "New Agent Pool" > "Add pool" to create a new agent pool. Enter the agent pool's name, which must match the value you provided VMSS_NAME (see steps above).
Make sure your admin is added as the administrator in ADO in 2 places:
- Azure DevOps > "Project Settings" (bottom right) > "Agent Pools" > [newly created agent poool] >"Security Tab" and
- Azure DevOps > bizair > Organization Settings > Agent Pools > Security
#### 3. Connect VMs to pool
Steps for Mac and Ubuntu:
- Copy some files to fix some errors in the generation of the agent image:
- The error is due to a issue with the packer script. It's not downloading a postgresql installation script.
In order to check if the image was not fully build, connect to the vm using ssh (see steps below), and run this: `INSTALLER_SCRIPT_FOLDER="/imagegeneration/installers" source /imagegeneration/installers/test-toolcache.sh`.
If you don't get any error message, skip the following 3 steps.
- Tar the image folder: `tar -zcvf image.tar.gz image`
- Set Key Pair name: `export SSH_KEY_PAIR_NAME="rayagentadminrsa"`
- Set SSH key pair file path: `export SSH_KEY_PAIR_PATH="$HOME/.ssh/$SSH_KEY_PAIR_NAME"`
- Set the IP of your VM: `export IP={my.ip}`
- Copy to each of your machines in the Scale set: `scp -o "IdentitiesOnly=yes" -i $SSH_KEY_PAIR_PATH ./image.tar.gz agentadmin@"${IP}":/home/agentadmin`
- Delete the tar: `rm image.tar.gz`
- Connect using ssh:
- Open a ssh tunnel: `ssh -o "IdentitiesOnly=yes" -i $SSH_KEY_PAIR_PATH agentadmin@"${IP}"`
- Fix the image:
- Untar the image file: `tar zxvf ./image.tar.gz`
- Switch to root: `sudo -s`
- In your machine get PAT from the Vault:
- Set Key Pair name: `export GITHUB_FEED_TOKEN_NAME="raygithubfeedtoken"`
- Set Key Vault name: `export KEY_VAULT_NAME="ray-agent-secrets"`
- Get the token: `az keyvault secret show --name $GITHUB_FEED_TOKEN_NAME --vault-name $KEY_VAULT_NAME --query 'value' --output tsv`
- Set the PAT in your ssh session: `export GITHUB_FEED_TOKEN={ GitHub Token }`
- Add agentadmin to the root group: `sudo gpasswd -a agentadmin root`
- Install missing part: `source ./image/fix-image.sh`
- Set the system up:
```
export GITHUB_FEED_TOKEN={ GitHub Token }
export DEBIAN_FRONTEND=noninteractive
export METADATA_FILE="/imagegeneration/metadatafile"
export HELPER_SCRIPTS="/imagegeneration/helpers"
export INSTALLER_SCRIPT_FOLDER="/imagegeneration/installers"
export BOOST_VERSIONS="1.69.0"
export BOOST_DEFAULT="1.69.0"
export AGENT_TOOLSDIRECTORY=/opt/hostedtoolcache
mkdir -p $INSTALLER_SCRIPT_FOLDER/node_modules
sudo chmod --recursive a+rwx $INSTALLER_SCRIPT_FOLDER/node_modules
sudo chown -R agentadmin:root $INSTALLER_SCRIPT_FOLDER/node_modules
source $INSTALLER_SCRIPT_FOLDER/hosted-tool-cache.sh
source $INSTALLER_SCRIPT_FOLDER/test-toolcache.sh
chown -R agentadmin:root $AGENT_TOOLSDIRECTORY
echo 'export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion" # This loads nvm bash_completion
AGENT_TOOLSDIRECTORY="/opt/hostedtoolcache/"' >> ~/.bashrc
```
- Go to the [New Agent] option in the pool and follow the instructions for linux agents:
- Download the agent: `wget https://vstsagentpackage.azureedge.net/agent/2.170.1/vsts-agent-linux-x64-2.170.1.tar.gz`
- Create and move to a directory for the agent: `mkdir myagent && cd myagent`
- Untar the agent: `tar zxvf ../vsts-agent-linux-x64-2.170.1.tar.gz`
- Configure the agent: `./config.sh`
- Accept the license.
- Enter your organization URL.
- Enter your ADO PAT.
- Set a Personal Access Token:
- Set Key Pair name: `ADO_TOKEN_NAME="rayagentadotoken"`
- Upload your PAT to the vault (replace your token in the command):`az keyvault secret set --name $ADO_TOKEN_NAME --vault-name $KEY_VAULT_NAME --value "{ADO Token}"`
- Enter the agent pool's name, which must match the value you provided VMSS_NAME (see steps above)
- Enter or accept agent name.
- Install the ADO Agent as a service and start it:
- `sudo ./svc.sh install`
- `sudo ./svc.sh start`
- `sudo ./svc.sh status`
- Allow agent user to access Docker:
- `export VM_ADMIN_USER="agentadmin"`
- `sudo gpasswd -a "${VM_ADMIN_USER}" docker`
- `sudo chmod ga+rw /var/run/docker.sock`
- Update group permissions so docker is available without logging out and back in: `newgrp - docker`
- Test docker: `docker run hello-world`
- `export VM_ADMIN_USER="agentadmin"`
- If `/home/"$VM_ADMIN_USER"/.docker` exist:
- `sudo chown "$VM_ADMIN_USER":docker /home/"$VM_ADMIN_USER"/.docker -R`
- `sudo chmod ga+rwx "$HOME/.docker" -R`
- Create a symlink:
- `mkdir -p /home/agentadmin/myagent/_work`
- `ln -s /opt/hostedtoolcache /home/agentadmin/myagent/_work/_tool`
### Deleting an Agent Pool
1. Open Azure DevOps > Settings > Agent Pools > find pool to be removed and click "..." > Delete
2. Open Azure Portal > Key Vaults > ray-agent-secrets > Access Policies > delete the access policy assigned to the VMSS to be deleted
3. Open Azure Portal > All Resources > type the VMSS name into the search bar > select and delete the following resources tied to that VMSS:
- public IP address
- load balancer
- the VMSS itself
### Useful Commands
```
# Get connection info for all VMSS instances
az vmss list-instance-connection-info -g $RESOURCE_GROUP_NAME --name $VMSS_NAME

# SSH to a VMSS instance
ssh -o "IdentitiesOnly=yes" -i $SSH_KEY_PAIR_PATH agentadmin@{ PUBLIC IP}

# Download agentadmin private SSH key (formatting is lost if key is pulled from the UI)
az keyvault secret download --file $SSH_KEY_PAIR_PATH --vault-name $KEY_VAULT_NAME --name $SSH_KEY_PAIR_NAME


az keyvault secret download --file ~/downloads/PAT --vault-name $KEY_VAULT_NAME --name $ADO_TOKEN_NAME
```
Loading

0 comments on commit 97782a6

Please sign in to comment.