-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: add automated and on demand testing of fluence #49
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
#!/bin/bash | ||
|
||
# This will test fluence with two jobs. | ||
# We choose jobs as they generate output and complete, and pods | ||
# are expected to keep running (and then would error) | ||
|
||
set -eEu -o pipefail | ||
|
||
# ensure upstream exists | ||
# This test script assumes fluence image and sidecar are already built | ||
make prepare | ||
|
||
# Keep track of root directory to return to | ||
here=$(pwd) | ||
|
||
# Never will use our loaded (just built) images | ||
cd upstream/manifests/install/charts | ||
helm install \ | ||
--set scheduler.image=ghcr.io/flux-framework/fluence:latest \ | ||
--set scheduler.sidecarimage=ghcr.io/flux-framework/fluence-sidecar:latest \ | ||
--set scheduler.pullPolicy=Never \ | ||
--set scheduler.sidecarPullPolicy=Never \ | ||
schedscheduler-plugins as-a-second-scheduler/ | ||
|
||
# These containers should already be loaded into minikube | ||
echo "Sleeping 10 seconds waiting for scheduler deploy" | ||
sleep 10 | ||
kubectl get pods | ||
|
||
# This will get the fluence image (which has scheduler and sidecar), which should be first | ||
fluence_pod=$(kubectl get pods -o json | jq -r .items[0].metadata.name) | ||
echo "Found fluence pod ${fluence_pod}" | ||
|
||
# Show logs for debugging, if needed | ||
echo | ||
echo "⭐️ kubectl logs ${fluence_pod} -c sidecar" | ||
kubectl logs ${fluence_pod} -c sidecar | ||
echo | ||
echo "⭐️ kubectl logs ${fluence_pod} -c scheduler-plugins-scheduler" | ||
kubectl logs ${fluence_pod} -c scheduler-plugins-scheduler | ||
|
||
# We now want to apply the examples | ||
cd ${here}/examples/test_example | ||
|
||
# Apply both example jobs | ||
kubectl apply -f fluence-job.yaml | ||
kubectl apply -f default-job.yaml | ||
|
||
# Get them based on associated job | ||
fluence_job_pod=$(kubectl get pods --selector=job-name=fluence-job -o json | jq -r .items[0].metadata.name) | ||
default_job_pod=$(kubectl get pods --selector=job-name=default-job -o json | jq -r .items[0].metadata.name) | ||
|
||
echo | ||
echo "Fluence job pod is ${fluence_job_pod}" | ||
echo "Default job pod is ${default_job_pod}" | ||
sleep 10 | ||
|
||
# Shared function to check output | ||
function check_output { | ||
check_name="$1" | ||
actual="$2" | ||
expected="$3" | ||
if [[ "${expected}" != "${actual}" ]]; then | ||
echo "Expected output is ${expected}" | ||
echo "Actual output is ${actual}" | ||
exit 1 | ||
fi | ||
} | ||
|
||
# Get output (and show) | ||
default_output=$(kubectl logs ${default_job_pod}) | ||
default_scheduled_by=$(kubectl get pod ${default_job_pod} -o json | jq -r .spec.schedulerName) | ||
echo | ||
echo "Default scheduler pod output: ${default_output}" | ||
echo " Scheduled by: ${default_scheduled_by}" | ||
|
||
fluence_output=$(kubectl logs ${fluence_job_pod}) | ||
fluence_scheduled_by=$(kubectl get pod ${fluence_job_pod} -o json | jq -r .spec.schedulerName) | ||
echo | ||
echo "Fluence scheduler pod output: ${fluence_output}" | ||
echo " Scheduled by: ${fluence_scheduled_by}" | ||
|
||
# Check output explicitly | ||
check_output 'check-fluence-output' "${fluence_output}" "potato" | ||
check_output 'check-default-output' "${default_output}" "not potato" | ||
check_output 'check-default-scheduled-by' "${default_scheduled_by}" "default-scheduler" | ||
check_output 'check-fluence-scheduled-by' "${fluence_scheduled_by}" "fluence" | ||
|
||
# But events tell us actually what happened, let's parse throught them and find our pods | ||
# This tells us the Event -> reason "Scheduled" and who it was reported by. | ||
reported_by=$(kubectl events --for pod/${fluence_job_pod} -o json | jq -c '[ .items[] | select( .reason | contains("Scheduled")) ]' | jq -r .[0].reportingComponent) | ||
check_output 'reported-by-fluence' "${reported_by}" "fluence" | ||
|
||
# And the second should be the default scheduler, but reportingComponent is empty and we see the | ||
# result in the source -> component | ||
reported_by=$(kubectl events --for pod/${default_job_pod} -o json | jq -c '[ .items[] | select( .reason | contains("Scheduled")) ]' | jq -r .[0].source.component) | ||
check_output 'reported-by-default' "${reported_by}" "default-scheduler" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
name: fluence build test | ||
|
||
on: | ||
pull_request: [] | ||
# Test on demand (dispath) or once a week, sunday | ||
# We combine the builds into one job to simplify not needing to share | ||
# containers between jobs. We also don't want to push unless the tests pass. | ||
workflow_dispatch: | ||
schedule: | ||
- cron: '0 0 * * 0' | ||
|
||
jobs: | ||
build-fluence: | ||
env: | ||
container: ghcr.io/flux-framework/fluence | ||
runs-on: ubuntu-latest | ||
name: build fluence | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- uses: actions/setup-go@v3 | ||
with: | ||
go-version: ^1.19 | ||
|
||
- name: Build Containers | ||
run: | | ||
make prepare | ||
make build REGISTRY=ghcr.io/flux-framework SCHEDULER_IMAGE=fluence | ||
|
||
- name: Save Container | ||
run: docker save ${{ env.container }} | gzip > fluence_latest.tar.gz | ||
|
||
- name: Upload container artifact | ||
uses: actions/upload-artifact@v4 | ||
with: | ||
name: fluence | ||
path: fluence_latest.tar.gz | ||
|
||
build-sidecar: | ||
env: | ||
container: ghcr.io/flux-framework/fluence-sidecar | ||
runs-on: ubuntu-latest | ||
name: build sidecar | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- uses: actions/setup-go@v3 | ||
with: | ||
go-version: ^1.19 | ||
|
||
- name: Build Container | ||
run: | | ||
make prepare | ||
make build-sidecar REGISTRY=ghcr.io/flux-framework SIDECAR_IMAGE=fluence-sidecar | ||
|
||
- name: Save Container | ||
run: docker save ${{ env.container }} | gzip > fluence_sidecar_latest.tar.gz | ||
|
||
- name: Upload container artifact | ||
uses: actions/upload-artifact@v4 | ||
with: | ||
name: fluence_sidecar | ||
path: fluence_sidecar_latest.tar.gz | ||
|
||
test-fluence: | ||
needs: [build-fluence, build-sidecar] | ||
permissions: | ||
packages: write | ||
env: | ||
fluence_container: ghcr.io/flux-framework/fluence | ||
sidecar_container: ghcr.io/flux-framework/fluence-sidecar | ||
|
||
runs-on: ubuntu-latest | ||
name: build fluence | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- uses: actions/setup-go@v3 | ||
with: | ||
go-version: ^1.20 | ||
|
||
- name: Download fluence artifact | ||
uses: actions/download-artifact@v4 | ||
with: | ||
name: fluence | ||
path: /tmp | ||
|
||
- name: Download fluence_sidecar artifact | ||
uses: actions/download-artifact@v4 | ||
with: | ||
name: fluence_sidecar | ||
path: /tmp | ||
|
||
- name: Load Docker images | ||
run: | | ||
ls /tmp/*.tar.gz | ||
docker load --input /tmp/fluence_sidecar_latest.tar.gz | ||
docker load --input /tmp/fluence_latest.tar.gz | ||
docker image ls -a | grep fluence | ||
|
||
- name: Create Kind Cluster | ||
uses: helm/kind-action@v1.5.0 | ||
with: | ||
cluster_name: kind | ||
kubectl_version: v1.28.2 | ||
version: v0.20.0 | ||
|
||
- name: Load Docker Containers into Kind | ||
env: | ||
fluence: ${{ env.fluence_container }} | ||
sidecar: ${{ env.sidecar_container }} | ||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
run: | | ||
kind load docker-image ${fluence} | ||
kind load docker-image ${sidecar} | ||
|
||
- name: Test Fluence | ||
run: /bin/bash ./.github/test.sh | ||
|
||
- name: Tag Weekly Images | ||
run: | | ||
# YEAR-MONTH-DAY or #YYYY-MM-DD | ||
tag=$(echo $(date +%Y-%m-%d)) | ||
echo "Tagging and releasing ${{ env.fluence_container}}:${tag}" | ||
docker tag ${{ env.fluence_container }}:latest ${{ env.fluence_container }}:${tag} | ||
echo "Tagging and releasing ${{ env.sidecar_container}}:${tag}" | ||
docker tag ${{ env.sidecar_container }}:latest ${{ env.sidecar_container }}:${tag} | ||
|
||
# If we get here, tests pass, and we can deploy | ||
- name: GHCR Login | ||
if: (github.event_name != 'pull_request') | ||
uses: docker/login-action@v2 | ||
with: | ||
registry: ghcr.io | ||
username: ${{ github.actor }} | ||
password: ${{ secrets.GITHUB_TOKEN }} | ||
|
||
- name: Deploy Containers | ||
if: (github.event_name != 'pull_request') | ||
run: | | ||
docker push ${{ env.fluence_container }} --all-tags | ||
docker push ${{ env.sidecar_container }} --all-tags |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
apiVersion: batch/v1 | ||
kind: Job | ||
metadata: | ||
name: default-job | ||
spec: | ||
template: | ||
spec: | ||
schedulerName: default-scheduler | ||
containers: | ||
- name: default-job | ||
image: busybox | ||
command: [echo, not, potato] | ||
restartPolicy: Never | ||
backoffLimit: 4 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
apiVersion: batch/v1 | ||
kind: Job | ||
metadata: | ||
name: fluence-job | ||
spec: | ||
template: | ||
spec: | ||
schedulerName: fluence | ||
containers: | ||
- name: fluence-job | ||
image: busybox | ||
command: [echo, potato] | ||
restartPolicy: Never | ||
backoffLimit: 4 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that scheduling pods with kube-scheduler and Fluence on the same cluster isn't supported. There isn't currently any way to propagate pod-to-node mappings generated by kube-scheduler to Fluence.
It's important that
kubectl apply -f fluence-job.yaml
is executed beforekubectl apply -f default-job.yaml
, and that they don't specify limits or requests so they could be scheduled on the same node. That's currently the case in this PR, but I'm emphasizing it for posterity.Regardless, there still may be some funky race condition that occurs and results in unschedulable pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha - I think likely for the testing cluster (and the example we already had in main) we are just doing that, putting them on the same node, and since it's a tiny kind or otherwise local cluster, there hasn't been an issue. If we extended this to an actual setup, there would be. This is an important point and I've opened an issue for emphasizing it in in future docs: #53 and maybe we can think of a creative way to allow for both, possibly with kueue resource flavors that create distinct (separate) resources that are labeled for each.