Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ROC Component #559

Merged
merged 3 commits into from
Dec 18, 2018
Merged

Fix ROC Component #559

merged 3 commits into from
Dec 18, 2018

Conversation

qimingj
Copy link
Contributor

@qimingj qimingj commented Dec 18, 2018

If it is a binary classification, which is usually the case when ROC is involved, we either expect a "trueclass" value to indicate the value of the true class, or expect the data is like:

predicted, target, true, false
false,true,0.1,0.9
...

in which there is a "true" column for the probability, and "true" is a predicted class.


This change is Reviewable

@@ -32,11 +32,13 @@
def main(argv=None):
parser = argparse.ArgumentParser(description='ML Trainer')
parser.add_argument('--predictions', type=str, help='GCS path of prediction file pattern.')
parser.add_argument('--trueclass', type=str, help='The name of the class as true value.')
parser.add_argument('--trueclass', type=str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit confusing to use trueclass for two different purposes: a) positive label for target column when target_lambda is missing. b) y_score column name.

Can we add a score_lambda for the second purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added (slightly different).

fpr, tpr, thresholds = roc_curve(df['target'], df[args.trueclass])
roc_auc = roc_auc_score(df['target'], df[args.trueclass])
df['target'] = df['target'].apply(lambda x: 1 if x == trueclass else 0)
fpr, tpr, thresholds = roc_curve(df['target'], df[trueclass])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: roc_curve function supports positive label via pos_label parameter. You may need to use it instead of building the lambda yourself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Didn't know that before.

Now it seems I have to use lambda because I want to make it case sensitive for default value "True".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed my mind. It is case sensitive now. But since I need to support target_lambda, it may not always be a label. For example, the target_lambda could be: lambda x: x['a'] > x['b'].


trueclass = next((x for x in names if x.lower() == 'true'), None)
if trueclass is None:
raise ValueError('trueclass is not provided, and there is no "true" column.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the data from the tfx sample have the "True" column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. The model outputs a "true" column and a "false" column.

@qimingj
Copy link
Contributor Author

qimingj commented Dec 18, 2018

PTAL

Copy link
Contributor

@hongye-sun hongye-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@qimingj
Copy link
Contributor Author

qimingj commented Dec 18, 2018

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qimingj

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit a23abf8 into kubeflow:master Dec 18, 2018
@gaoning777
Copy link
Contributor

gaoning777 commented Dec 18, 2018

Looks that there are still bugs in roc:
run-tfx-tests: roc: roc.py: error: unrecognized arguments: --target_lambda lambda x: 1 if (x['target'] > x['fare'] * 0.2) else 0
And, it happens to the confusion matrix, too.
confusion-matrix�[0m: confusion_matrix.py: error: unrecognized arguments: --target_lambda lambda x: (x['target'] > x['fare'] * 0.2)

@gaoning777
Copy link
Contributor

Found the bug.
the image are not replaced by the sample test after #518 has added confusion matrix and roc to the sample.

@qimingj
Copy link
Contributor Author

qimingj commented Dec 18, 2018

I think I did it manually?

Linchin pushed a commit to Linchin/pipelines that referenced this pull request Apr 11, 2023
… applications (kubeflow#559)

* Define a config file to list
  i) Relevant information about each application to build
     e.g the location of its source, the location of its kustomize manifest
 ii) The versions (i.e. branches) of the various repositories to build from

* Create a python script that takes the cross product of applications and
  versions and creates a tekton PipelineRun to build the docker image
  and update the manifest.

* Fix a bug in update_manifests with image_name not being set.
Related to kubeflow#450: Continuous delivery of Kubeflow applications

Here are some autogenerated PRs updating the manifests

  * kubeflow/manifests#697
  * kubeflow/manifests#696

* Fix email of kubeflow bot to pass CLA check.

  * Fix kubeflow#557

Skip lint due to kubeflow#560
magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request Oct 22, 2023
HumairAK pushed a commit to red-hat-data-services/data-science-pipelines that referenced this pull request Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants