Add trusted ai example (kubeflow#134)

* add trusted ai example * add hyperlink * update feature flag for example
red-hat-data-services · May 5, 2020 · dc2575b · dc2575b
1 parent f8673e7
commit dc2575b
Show file tree

Hide file tree

Showing 4 changed files with 214 additions and 1 deletion.
diff --git a/samples/katib/README.md b/samples/katib/README.md
@@ -13,9 +13,10 @@ This pipeline uses the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) to tra
 
 1. First, go to the Kubeflow dashboard and create a user namespace. The Kubeflow dashboard is the endpoint to your istio-ingressgateway. We will be using the namespace `anonymous` for this example.
 
-2. Compile the Katib pipeline
+2. Compile and apply the Katib pipeline
 ```shell
 dsl-compile-tekton --py katib.py --output katib.yaml
+kubectl apply -f katib.yaml -n anonymous
 ```
 
 3. Run the Katib pipeline, click the `enter` key to use the default pipeline variables.

diff --git a/samples/trusted-ai/README.md b/samples/trusted-ai/README.md
@@ -0,0 +1,91 @@
+# Trusted AI framework integration
+
+Artificial intelligence is becoming a crucial component of enterprises’ operations and strategy. In order to responsibly take advantage of AI today, we must figure out ways to instill transparency, explainability, fairness, and robustness into AI. In this example, we will be going over how to produce fairness and robustness metrics using the Trusted AI libraries from [AI Fairness 360](https://github.com/IBM/AIF360) and [Adversarial Robustness Toolbox](https://github.com/IBM/adversarial-robustness-toolbox).
+
+This pipeline uses the [UTKface's aligned & cropped faces dataset](https://susanqq.github.io/UTKFace/) to train a gender classification model using the Katib engine. Once the training is completed, there will be two extra tasks that use the stored model and dataset to produce fairness and robustness metrics.
+
+## Prerequisites 
+- Install [Kubeflow 1.0.2+](https://www.kubeflow.org/docs/started/getting-started/) and connect the cluster to the current shell with `kubectl`
+- Install [Tekton 0.11.3](https://github.com/tektoncd/pipeline/releases/tag/v0.11.3) and [Tekton CLI](https://github.com/tektoncd/cli)
+    - For KFP, we shouldn't modify the default work directory for any component. Therefore, please run the below command to disable the [home and work directory overwrite](https://github.com/tektoncd/pipeline/blob/master/docs/install.md#customizing-the-pipelines-controller-behavior) from Tekton default.
+        ```shell
+        kubectl patch cm feature-flags -n tekton-pipelines -p '{"data":{"disable-home-env-overwrite":"true","disable-working-directory-overwrite":"true"}}'
+        ```
+- Install [kfp-tekton](/sdk/README.md#steps) SDK
+
+## Instructions
+
+1. First, go to the Kubeflow dashboard and create a user namespace. The Kubeflow dashboard is the endpoint to your istio-ingressgateway. We will be using the namespace `anonymous` for this example.
+
+2. Compile and apply the trusted-ai pipeline
+```shell
+dsl-compile-tekton --py trusted-ai.py --output trusted-ai.yaml
+kubectl apply -f trusted-ai.yaml -n anonymous
+```
+
+3. Run the trusted-ai pipeline, click the `enter` key to use the default pipeline variables.
+```shell
+tkn pipeline start launch-katib-experiment -s default-editor -n anonymous --showlog
+```
+
+This pipeline will run for 10 to 15 minutes, then you should able to see the best hyperparameter tuning result at the end of the logs.
+```
+? Value for param `name` of type `string`? (Default is `trusted-ai`) trusted-ai
+? Value for param `namespace` of type `string`? (Default is `anonymous`) anonymous
+? Value for param `goal` of type `string`? (Default is `0.99`) 0.99
+? Value for param `parallelTrialCount` of type `string`? (Default is `1`) 1
+? Value for param `maxTrialCount` of type `string`? (Default is `1`) 1
+? Value for param `experimentTimeoutMinutes` of type `string`? (Default is `60`) 60
+? Value for param `deleteAfterDone` of type `string`? (Default is `True`) True
+? Value for param `fgsm_attack_epsilon` of type `string`? (Default is `0.2`) 0.2
+? Value for param `model_class_file` of type `string`? (Default is `PyTorchModel.py`) PyTorchModel.py
+? Value for param `model_class_name` of type `string`? (Default is `ThreeLayerCNN`) ThreeLayerCNN
+? Value for param `feature_testset_path` of type `string`? (Default is `processed_data/X_test.npy`) processed_data/X_test.npy
+? Value for param `label_testset_path` of type `string`? (Default is `processed_data/y_test.npy`) processed_data/y_test.npy
+? Value for param `protected_label_testset_path` of type `string`? (Default is `processed_data/p_test.npy`) processed_data/p_test.npy
+? Value for param `favorable_label` of type `string`? (Default is `0.0`) 0.0
+? Value for param `unfavorable_label` of type `string`? (Default is `1.0`) 1.0
+? Value for param `privileged_groups` of type `string`? (Default is `[{'race': 0.0}]`) [{'race': 0.0}]
+? Value for param `unprivileged_groups` of type `string`? (Default is `[{'race': 4.0}]`) [{'race': 4.0}]
+? Value for param `loss_fn` of type `string`? (Default is `torch.nn.CrossEntropyLoss()`) torch.nn.CrossEntropyLoss()
+? Value for param `optimizer` of type `string`? (Default is `torch.optim.Adam(model.parameters(), lr=0.001)`) torch.optim.Adam(model.parameters(), lr=0.001)
+? Value for param `clip_values` of type `string`? (Default is `(0, 1)`) (0, 1)
+? Value for param `nb_classes` of type `string`? (Default is `2`) 2
+? Value for param `input_shape` of type `string`? (Default is `(1,3,64,64)`) (1,3,64,64)
+Pipelinerun started: launch-ai-ethics-experiment-run-96lqr
+Waiting for logs to be available...
+
+[kubeflow-launch-experiment : kubeflow-launch-experiment] INFO:root:Generating experiment template.
+[kubeflow-launch-experiment : kubeflow-launch-experiment] INFO:root:Creating kubeflow.org/experiments ai-ethics in namespace anonymous.
+[kubeflow-launch-experiment : kubeflow-launch-experiment] INFO:root:Created kubeflow.org/experiments ai-ethics in namespace anonymous.
+[kubeflow-launch-experiment : kubeflow-launch-experiment] INFO:root:Current condition of kubeflow.org/experiments ai-ethics in namespace anonymous is Created.
+[kubeflow-launch-experiment : kubeflow-launch-experiment] INFO:root:Current condition of kubeflow.org/experiments ai-ethics in namespace anonymous is Running.
+[kubeflow-launch-experiment : kubeflow-launch-experiment] INFO:root:Current condition of kubeflow.org/experiments ai-ethics in namespace anonymous is Running.
+[kubeflow-launch-experiment : kubeflow-launch-experiment] INFO:root:Current condition of kubeflow.org/experiments ai-ethics in namespace anonymous is Running.
+[kubeflow-launch-experiment : kubeflow-launch-experiment] INFO:root:Current condition of kubeflow.org/experiments ai-ethics in namespace anonymous is Running.
+[kubeflow-launch-experiment : kubeflow-launch-experiment] INFO:root:kubeflow.org/experiments ai-ethics in namespace anonymous has reached the expected condition: Succeeded.
+[kubeflow-launch-experiment : kubeflow-launch-experiment] INFO:root:Deleteing kubeflow.org/experiments ai-ethics in namespace anonymous.
+[kubeflow-launch-experiment : kubeflow-launch-experiment] INFO:root:Deleted kubeflow.org/experiments ai-ethics in namespace anonymous.
+
+[model-fairness-check : model-fairness-check] #### Plain model - without debiasing - classification metrics on test set
+[model-fairness-check : model-fairness-check] metrics:  {'Classification accuracy': 0.8736901727555934, 'Balanced classification accuracy': 0.8736589340231593, 'Statistical parity difference': -0.08945831914198538, 'Disparate impact': 0.8325187064870843, 'Equal opportunity difference': -0.04179856216322675, 'Average odds difference': -0.028667217801755983, 'Theil index': 0.09022707769476579, 'False negative rate difference': 0.041798562163226846}
+
+[adversarial-robustness-evaluation : adversarial-robustness-evaluation] metrics: {'model accuracy on test data': 0.8736901727555934, 'model accuracy on adversarial samples': 0.13565562163693004, 'confidence reduced on correctly classified adv_samples': 0.24403343492049726, 'average perturbation on misclassified adv_samples': 0.40323618054389954}
+```
+
+Below are the metrics definition for this example:
+**Fairness Metrics**
+- **Classification accuracy**: Amount of correct predictions using the test data. Ideal value: 1
+- **Balanced classification accuracy**: Balanced true positive and negative predictions (0.5*(TPR+TNR)) using the test data. Ideal value: 1
+- **Statistical parity difference**: Difference of the rate of favorable outcomes received by the unprivileged group to the privileged group. Ideal value: 0 (-0.1 to 0.1 will consider as fair)
+- **Disparate impact**: The ratio of rate of favorable outcome for the unprivileged group to that of the privileged group. Ideal value: 1 (0.8 to 1.2 will consider as fair)
+- **Equal opportunity difference**: Difference of true positive rates between the unprivileged and the privileged groups. Ideal value: 0 (-0.1 to 0.1 will consider as fair)
+- **Average odds difference**: Average difference of false positive rate (false positives / negatives) and true positive rate (true positives / positives) between unprivileged and privileged groups. Ideal value: 0 (-0.1 to 0.1 will consider as fair)
+- **Theil index**: Generalized entropy of benefit for all individuals in the dataset. It measures the inequality in benefit allocation for individuals. Ideal value: 0 (0 is the perfect fairness, there's no concrete interval to be considered as fair for this metric)
+- **False negative rate difference**: Difference of false negative rate between unprivileged and privileged instances. Ideal value: 0 (-0.1 to 0.1 will consider as fair)
+
+**Robustness Metrics**
+- **Model accuracy on test data**: Amount of correct predictions using the original test data. Ideal value: 1
+- **Model accuracy on adversarial samples**: Amount of correct predictions using the adversarial test samples. Ideal value: 1
+- **Reduction in confidence**: Average amount of confidence score get reduced. Ideal value: 0
+- **Average perturbation**: Average amount of [adversarial changes](https://en.wikipedia.org/wiki/Perturbation_theory) needed to make in order to fool the classifier. Ideal value: 1
diff --git a/samples/trusted-ai/trusted-ai.py b/samples/trusted-ai/trusted-ai.py
@@ -0,0 +1,117 @@
+import json
+from kfp import components
+import kfp.dsl as dsl
+
+
+katib_experiment_launcher_op = components.load_component_from_url('https://raw.githubusercontent.com/kubeflow/pipelines/master/components/kubeflow/katib-launcher/component.yaml')
+fairness_check_ops = components.load_component_from_url('https://raw.githubusercontent.com/IBM/AIF360/master/mlops/kubeflow/bias_detector_pytorch/component.yaml')
+robustness_check_ops = components.load_component_from_url('https://raw.githubusercontent.com/IBM/adversarial-robustness-toolbox/master/mlops/kubeflow/robustness_evaluation_fgsm_pytorch/component.yaml')
+
+
+@dsl.pipeline(
+    name="Launch trusted-ai pipeline",
+    description="An example for trusted-ai integration."
+)
+def trusted_ai(
+        name="trusted-ai",
+        namespace="anonymous",
+        goal="0.99",
+        parallelTrialCount="1",
+        maxTrialCount="1",
+        experimentTimeoutMinutes="60",
+        deleteAfterDone="True",
+        fgsm_attack_epsilon='0.2',
+        model_class_file='PyTorchModel.py',
+        model_class_name='ThreeLayerCNN',
+        feature_testset_path='processed_data/X_test.npy',
+        label_testset_path='processed_data/y_test.npy',
+        protected_label_testset_path='processed_data/p_test.npy',
+        favorable_label='0.0',
+        unfavorable_label='1.0',
+        privileged_groups="[{'race': 0.0}]",
+        unprivileged_groups="[{'race': 4.0}]",
+        loss_fn='torch.nn.CrossEntropyLoss()',
+        optimizer='torch.optim.Adam(model.parameters(), lr=0.001)',
+        clip_values='(0, 1)',
+        nb_classes='2',
+        input_shape='(1,3,64,64)'):
+    objectiveConfig = {
+      "type": "maximize",
+      "goal": goal,
+      "objectiveMetricName": "accuracy",
+      "additionalMetricNames": []
+    }
+    algorithmConfig = {"algorithmName" : "random"}
+    parameters = [
+      {"name": "--dummy", "parameterType": "int", "feasibleSpace": {"min": "1", "max": "2"}},
+    ]
+    rawTemplate = {
+      "apiVersion": "batch/v1",
+      "kind": "Job",
+      "metadata": {
+         "name": "{{.Trial}}",
+         "namespace": "{{.NameSpace}}"
+      },
+      "spec": {
+        "template": {
+          "spec": {
+            "restartPolicy": "Never",
+            "containers": [
+              {"name": "{{.Trial}}",
+               "image": "aipipeline/gender-classification:latest",
+               "command": [
+                   "python", "-u", "gender_classification_training.py", "--data_bucket", "mlpipeline", "--result_bucket", "mlpipeline"
+               ]
+              }
+            ],
+            "env": [{'name': 'S3_ENDPOINT', 'value': 'minio-service.kubeflow:9000'}]
+          }
+        }
+      }
+    }
+    trialTemplate = {
+      "goTemplate": {
+        "rawTemplate": json.dumps(rawTemplate)
+      }
+    }
+    katib_run = katib_experiment_launcher_op(
+        experiment_name=name,
+        experiment_namespace=namespace,
+        parallel_trial_count=parallelTrialCount,
+        max_trial_count=maxTrialCount,
+        objective=str(objectiveConfig),
+        algorithm=str(algorithmConfig),
+        trial_template=str(trialTemplate),
+        parameters=str(parameters),
+        experiment_timeout_minutes=experimentTimeoutMinutes,
+        delete_finished_experiment=deleteAfterDone)
+
+    fairness_check = fairness_check_ops(model_id='training-example',
+                                        model_class_file=model_class_file,
+                                        model_class_name=model_class_name,
+                                        feature_testset_path=feature_testset_path,
+                                        label_testset_path=label_testset_path,
+                                        protected_label_testset_path=protected_label_testset_path,
+                                        favorable_label=favorable_label,
+                                        unfavorable_label=unfavorable_label,
+                                        privileged_groups=privileged_groups,
+                                        unprivileged_groups=unprivileged_groups,
+                                        data_bucket_name='mlpipeline',
+                                        result_bucket_name='mlpipeline').after(katib_run).set_image_pull_policy("Always")
+    robustness_check = robustness_check_ops(model_id='training-example',
+                                            epsilon=fgsm_attack_epsilon,
+                                            model_class_file=model_class_file,
+                                            model_class_name=model_class_name,
+                                            feature_testset_path=feature_testset_path,
+                                            label_testset_path=label_testset_path,
+                                            loss_fn=loss_fn,
+                                            optimizer=optimizer,
+                                            clip_values=clip_values,
+                                            nb_classes=nb_classes,
+                                            input_shape=input_shape,
+                                            data_bucket_name='mlpipeline',
+                                            result_bucket_name='mlpipeline').after(katib_run).set_image_pull_policy("Always")
+
+if __name__ == '__main__':
+    from kfp_tekton.compiler import TektonCompiler
+    TektonCompiler().compile(trusted_ai, __file__.replace('.py', '.yaml'))
diff --git a/sdk/README.md b/sdk/README.md
@@ -11,6 +11,10 @@ We are updating the `Compiler` of the KFP SDK to generate `Tekton` YAML. Please
  - Python: `3.7.5`
  - Kubeflow Pipelines: [`0.2.2`](https://github.com/kubeflow/pipelines/releases/tag/0.2.2)
  - Tekton: [`0.11.3`](https://github.com/tektoncd/pipeline/releases/tag/v0.11.3)
+    - For KFP, we shouldn't modify the default work directory for any component. Therefore, please run the below command to disable the [home and work directory overwrite](https://github.com/tektoncd/pipeline/blob/master/docs/install.md#customizing-the-pipelines-controller-behavior) from Tekton default.
+        ```shell
+        kubectl patch cm feature-flags -n tekton-pipelines -p '{"data":{"disable-home-env-overwrite":"true","disable-working-directory-overwrite":"true"}}'
+        ```
  - Tekton CLI: [`0.8.0`](https://github.com/tektoncd/cli/releases/tag/v0.8.0)
 
 ## Tested Pipelines