z4ce · xavpaice · May 1, 2023
diff --git a/hips/hip-9999.md b/hips/hip-9999.md
@@ -1,81 +1,82 @@
 ---
 hip: 9999
-title: "New annotations for pre-install and pre-upgrade to fail fast and display output"
-authors: [ "Ian Zink <ian@replicated.com>" ]
+title: "New CLI switch for hooks to display output"
+authors: [ "Ian Zink <ian@replicated.com>", "Xav Paice <xav@replicated.com>" ]
 created: "2023-01-26"
 type: "feature"
 status: "draft"
 ---
 
 ## Abstract
 
-This proposes two new annotations for hooks specific to jobs. One that will cause install, upgrade, or test hooks to block and fail fast. And a second annotation to indicate that the output from the job should be displayed to the user.
+This proposes a new CLI switch to indicate that the output from a job executed as a hook should be displayed to the user.
 
 
 ## Motivation
 
-The primary motivation for this HIP is the ability to run Preflight checks before a helm chart runs to verify it can successfully install. Preflight checks require a way to run and fail fast and to present that check that failed back to the user.
+The primary motivation for this HIP is the ability to run Preflight checks before a helm chart attempts to install permanent resources in the cluster, to verify it can successfully install. Preflight checks require a way to run, fail before the installation, and to present the results of the check that failed back to the user.
 
-Often it is important to verify that the kubernetes cluster you are deploying a helm chart into has certain properties. You might need to know that the cluster is of a certain version to use various APIs. You might need to know that it has ingress available, a certain amount of ephemeral storage, memory, or CPUs available. You might want to validate the the service key they provided was correct or that that database they entered is reachable. Letting a chart deploy and then finding debugging to see why it failed is a poor user experience. These things can all be done with preflight checks enabled by the hooks proposed in this HIP.
-
-In general, allowing chart developers to run jobs and present that feedback directly to the users could also open up additional use cases beyond just the preflight use case that motivated this HIP.
+Often it is important to verify that the kubernetes cluster you are deploying a helm chart into has certain properties. You might need to know that the cluster is of a certain version to use various APIs. You might need to know that it has ingress available, a certain amount of ephemeral storage, memory, or CPUs available. You might want to validate the the service key they provided was correct or that the database they entered is reachable. Letting a chart deploy and then  debugging to see why it failed is a poor user experience. These things can all be done with preflight checks enabled by the hooks proposed in this HIP.
 
 
 ## Rationale
 
-There are other ways that this could be implemented. For example, we could have a separate preflight hook type. However, this new hook type wouldn't be handled at all by previous versions of helm. With this design, the new hooks will degrade into timeout errors instead of continuing to the install phase.
-
-Another strategy could be for helm to include Troubleshoot.sh as a dependent library, but this could result in too tight of a coupling between the projects and lower overall flexibility and adaptability. 
+The problem with running preflight checks as hooks currently is that in order to read the logs from the job, you need to leave the resources created by the hook in place so that logs can be retrieved.  Ideally, a failed preflight check would leave no trace of itself in the cluster.  If hooks were to collect the output and display it to users via stdout, then install attempts could run using the `--atomic` switch along with settings to delete the resources, and folks would have useful output from the failed hook.
 
+In general, allowing chart developers to run jobs and present that feedback directly to the users could also open up additional use cases beyond just the preflight use case that motivated this HIP.
 
 ## Specification
 
-Templates could include the following annotations on Batch Jobs:
-
-```yaml
-        "helm.sh/hook": pre-install, pre-upgrade
-        "helm.sh/hook-fail-fast": "true"
-        "helm.sh/show-output": "true"
-```
-
-`helm.sh/hook-fail-fast` would indicate that helm should wait for this job to complete and if it fails should immediately exit the install process.
-`helm.sh/show-output` would indicate that helm should display the output of the job to the user.
-
-Additionally a new user flag should be created `--ignore-fail-fast` that would ignore the results of the job and continue with the install process.
+When calling `helm install` an additional CLI switch `--show-hook-logs` triggers the command to output the logs from any pods created during hook execution to stdout at hook completion.
 
+There should be no need to follow the logs in real time, printing the entire log at completion is acceptable.
 ## Backwards compatibility
 
-As helm charts added new fail-fast hooks, old versions of helm would process them as if they were normal hooks. If `--wait-for-jobs` was set, they would timeout and fail. If it was not set, they would continue on to the next hook.
+The new switch would not be accepted by older versions of Helm.
 
 ## Security implications
 
-As jobs can already arbitrary code, this HIP does not introduce any new security implications -- only the ability to fail fast and display output.
+As jobs can already arbitrary code, this HIP does not introduce any new security implications -- only the ability to display output.
 
 Potentially the preflight checks could check for security misconfigurations that could enhance the security of the cluster.
 
 ## How to teach this
 
-For one an example template would be provided showing how to use the new feature with Troubleshoot.sh to provide preflight checks.
+In the first instance, documentation plus the help text for `helm install` would explain the feature.
+
+An example template could be provided in documentation showing how to use this feature with a generic command used in a hook.
+
+A more advanced example showing how to use the new feature with Troubleshoot.sh to provide preflight checks could be linked in the documentation, provided directly in the documentation, or provided on the Troubleshoot.sh documentation site independently.
 
 ## Reference implementation
 
-The `safe-install` plugin (link in references) demonstrates what running preflights could look like, but not in the fashion implemented in this HIP.
+The [Troubleshoot Helm chart](https://github.com/xavpaice/helm-chart-troubleshoot) provides an example preflight, but currently misses the new annotation and therefore does not delete resources after running.  This would be updated when the annotation is implemented.
 
 ## Rejected ideas
-N/A
+
+There are other ways that this could be implemented. For example, we could have a separate preflight hook type. However, this new hook type wouldn't be handled at all by previous versions of helm.
+
+Another strategy could be for helm to include Troubleshoot.sh as a dependent library, but this could result in too tight of a coupling between the projects and lower overall flexibility and adaptability.
+
+Use of an extra annotation, e.g. `"helm.sh/hook-output-log-policy": hook-succeeded, hook-failed` was considered, however that puts the choice of viewing logs in the hands of the chart developer rather than the user executing the install.
 
 ## Open issues
-N/A
 
-## References
+Two issues have been closed due to inactivity:
+
+* [#2298](https://github.com/helm/helm/issues/2298)
+* [3481](https://github.com/helm/helm/issues/3481)
 
-[Troubleshoot.sh](https://troubleshoot.sh/) - the tool that is the motivation for this HIP. 
+## References
 
-[safe-install plugin](https://github.com/z4ce/helm-safe-install) - Plugin that provides a similiar experience to what I hope this HIP will provide natively.
+* [Troubleshoot.sh](https://troubleshoot.sh/) - the tool that is the motivation for this HIP.
+* [safe-install plugin](https://github.com/z4ce/helm-safe-install) - Plugin that provides a similiar experience to what I hope this HIP will provide natively.
+* [Troubleshoot Helm chart](https://github.com/xavpaice/helm-chart-troubleshoot) - Example Helm chart with a pre-install hook including a Preflight check.
+* [Prior code PR](https://github.com/helm/helm/pull/10309) & [associated Docs PR](https://github.com/helm/helm-www/pull/1242) - similar PRs covering a slightly different implementation of the same topic
 
-# Reference - Examples Usage
+## Reference - Examples Usage
 
-## Example using `false`
+### Example using `false`
 
 Template:
 ```yaml
@@ -92,7 +93,6 @@ metadata:
     # This is what defines this resource as a hook. Without this line, the
     # job is considered part of the release.
     "helm.sh/hook": pre-install, pre-upgrade
-    "helm.sh/hook-fail-fast": "true"
     "helm.sh/show-output": "true"
     "helm.sh/hook-weight": "-5"
     "helm.sh/hook-delete-policy": hook-succeeded, hook-failed
@@ -110,17 +110,20 @@ spec:
       containers:
       - name: post-install-job
         image: "alpine:3.3"
-        command: ["false"]
+        command: ["bash", "-c", "echo foo ; false"]
 ```
 
 What it should loook when running:
+
 ```
-$ helm install ./ my-release
-Fail-fast job failed: my-release-false-job
-Job my-release-false-job output: 
+$ helm install my-release ./ --atomic --show-hook-logs
+Error: INSTALLATION FAILED: failed pre-install: job failed: BackoffLimitExceeded
+Job output for my-release-false-job:
+foo
 ```
 
-## Example using Troubleshoot Preflight Checks
+### Example using Troubleshoot Preflight Checks
+
 ```yaml
 apiVersion: batch/v1
 kind: Job
@@ -135,12 +138,10 @@ metadata:
     # This is what defines this resource as a hook. Without this line, the
     # job is considered part of the release.
     "helm.sh/hook": pre-install, pre-upgrade
-    "helm.sh/hook-fail-fast": "true"
-    "helm.sh/show-output": "true"
     "helm.sh/hook-weight": "-5"
-    "helm.sh/hook-delete-policy": hook-succeeded, hook-failed
-
+    "helm.sh/hook-delete-policy": before-hook-creation, hook-succeeded, hook-failed
 spec:
+  backoffLimit: 0  # do not retry on failure
   template:
     metadata:
       name: "{{ .Release.Name }}"
@@ -149,95 +150,46 @@ spec:
         app.kubernetes.io/instance: {{ .Release.Name | quote }}
         helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
     spec:
+      serviceAccountName: "{{ .Release.Name }}-preflight"  # See references for full implementation
       restartPolicy: Never
       volumes:
         - name: preflights
-          configMap:
-            name: "{{ .Release.Name }}-preflight-config"
+          secret:
+            secretName: "{{ .Release.Name }}-preflight-config"  # See references for full implementation
+        - name: kube-api-token
+          projected:
+            defaultMode: 420
+            sources:
+              - serviceAccountToken:
+                  expirationSeconds: 3607
+                  path: token
       containers:
-      - name: post-install-job
-        image: "replicated/preflight:latest"
-        command: ["preflight", "--interactive=false", "--format json",  "/preflights/preflight.yaml"]
+      - name: pre-install-job
+        image: "{{ .Values.preflight.image }}"
+        command: 
+          - "preflight"
+          - "--interactive=false"
+          - "/preflights/preflight.yaml"
         volumeMounts:
-        - name: preflights
+        - name: preflights  # See references for full implementation
           mountPath: /preflights
-
----
-apiVersion: v1
-kind: ConfigMap
-metadata:
-  annotations:
-    "helm.sh/hook": pre-install, pre-upgrade
-    "helm.sh/hook-weight": "-6"
-    "helm.sh/hook-delete-policy": hook-succeeded, hook-failed
-  labels:
-    app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
-    app.kubernetes.io/instance: {{ .Release.Name | quote }}
-    app.kubernetes.io/version: {{ .Chart.AppVersion }}
-    helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
-  name: "{{ .Release.Name }}-preflight-config"
-data:
-  preflights.yaml: |
-    apiVersion: troubleshoot.sh/v1beta2
-    kind: Preflight
-    metadata:
-      name: preflight-tutorial
-    spec:
-      collectors:
-        {{ if eq .Values.mariadb.enabled false }}
-        - mysql:
-            collectorName: mysql
-            uri: '{{ .Values.externalDatabase.user }}:{{ .Values.externalDatabase.password }}@tcp({{ .Values.externalDatabase.host }}:{{ .Values.externalDatabase.port }})/{{ .Values.externalDatabase.database }}?tls=false'
-        {{ end }}
-      analyzers:
-        - clusterVersion:
-            outcomes:
-              - fail:
-                  when: "< 1.16.0"
-                  message: The application requires at least Kubernetes 1.16.0, and recommends 1.18.0.
-                  uri: https://kubernetes.io
-              - warn:
-                  when: "< 1.18.0"
-                  message: Your cluster meets the minimum version of Kubernetes, but we recommend you update to 1.18.0 or later.
-                  uri: https://kubernetes.io
-              - pass:
-                  message: Your cluster meets the recommended and required versions of Kubernetes.
-        {{ if eq .Values.mariadb.enabled false }}
-        - mysql:
-            checkName: Must be MySQL 8.x or later
-            collectorName: mysql
-            outcomes:
-              - fail:
-                  when: connected == false
-                  message: Cannot connect to MySQL server
-              - fail:
-                  when: version < 8.x
-                  message: The MySQL server must be at least version 8
-              - pass:
-                  message: The MySQL server is ready
-        {{ end }}
 ```
-What it should loook when running:
+
+What it should look like when running:
+
 ```
-$ helm install ./ my-release
-Fail-fast job failed: my-release-preflight-job
+$ helm install my-release ./ --atomic --show-hook-logs
+Error: INSTALLATION FAILED: failed pre-install: job failed: BackoffLimitExceeded
 Job my-release-preflight-job output: 
-name: cluster-resources    status: completed       completed: 1    total: 3
-name: mysql/mysql          status: running         completed: 1    total: 3
-name: mysql/mysql          status: completed       completed: 2    total: 3
-name: cluster-info         status: running         completed: 2    total: 3
-{
-  "fail": [
-    {
-      "title": "Required Kubernetes Version",
-      "message": "The application requires at least Kubernetes 1.16.0, and recommends 1.18.0.",
-      "uri": "https://kubernetes.io"
-    },
-    {
-      "title": "Must be MySQL 8.x or later",
-      "message": "Cannot connect to MySQL server"
-    }
-  ]
-}
-name: cluster-info         status: completed       completed: 3    total: 3
+name: cluster-info         status: running         completed: 0    total: 2   
+name: cluster-info         status: completed       completed: 1    total: 2   
+name: cluster-resources    status: running         completed: 1    total: 2   
+name: cluster-resources    status: completed       completed: 2    total: 2   
+
+   --- FAIL: Node Count Check
+      --- The cluster has less than 3 nodes.
+   --- PASS Required Kubernetes Version
+      --- Your cluster meets the recommended and required versions of Kubernetes.
+--- FAIL   preflight-tutorial
+FAILED
 ```