-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ROC Component #559
Fix ROC Component #559
Conversation
components/local/roc/src/roc.py
Outdated
@@ -32,11 +32,13 @@ | |||
def main(argv=None): | |||
parser = argparse.ArgumentParser(description='ML Trainer') | |||
parser.add_argument('--predictions', type=str, help='GCS path of prediction file pattern.') | |||
parser.add_argument('--trueclass', type=str, help='The name of the class as true value.') | |||
parser.add_argument('--trueclass', type=str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit confusing to use trueclass for two different purposes: a) positive label for target column when target_lambda is missing. b) y_score column name.
Can we add a score_lambda for the second purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Added (slightly different).
components/local/roc/src/roc.py
Outdated
fpr, tpr, thresholds = roc_curve(df['target'], df[args.trueclass]) | ||
roc_auc = roc_auc_score(df['target'], df[args.trueclass]) | ||
df['target'] = df['target'].apply(lambda x: 1 if x == trueclass else 0) | ||
fpr, tpr, thresholds = roc_curve(df['target'], df[trueclass]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: roc_curve function supports positive label via pos_label parameter. You may need to use it instead of building the lambda yourself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Didn't know that before.
Now it seems I have to use lambda because I want to make it case sensitive for default value "True".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed my mind. It is case sensitive now. But since I need to support target_lambda, it may not always be a label. For example, the target_lambda could be: lambda x: x['a'] > x['b'].
components/local/roc/src/roc.py
Outdated
|
||
trueclass = next((x for x in names if x.lower() == 'true'), None) | ||
if trueclass is None: | ||
raise ValueError('trueclass is not provided, and there is no "true" column.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the data from the tfx sample have the "True" column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. The model outputs a "true" column and a "false" column.
PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: qimingj The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Looks that there are still bugs in roc: |
Found the bug. |
I think I did it manually? |
… applications (kubeflow#559) * Define a config file to list i) Relevant information about each application to build e.g the location of its source, the location of its kustomize manifest ii) The versions (i.e. branches) of the various repositories to build from * Create a python script that takes the cross product of applications and versions and creates a tekton PipelineRun to build the docker image and update the manifest. * Fix a bug in update_manifests with image_name not being set. Related to kubeflow#450: Continuous delivery of Kubeflow applications Here are some autogenerated PRs updating the manifests * kubeflow/manifests#697 * kubeflow/manifests#696 * Fix email of kubeflow bot to pass CLA check. * Fix kubeflow#557 Skip lint due to kubeflow#560
If it is a binary classification, which is usually the case when ROC is involved, we either expect a "trueclass" value to indicate the value of the true class, or expect the data is like:
predicted, target, true, false
false,true,0.1,0.9
...
in which there is a "true" column for the probability, and "true" is a predicted class.
This change is