-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AUC chart #2171
base: main
Are you sure you want to change the base?
AUC chart #2171
Conversation
const rocData: IROCData = { | ||
points: [] | ||
}; | ||
const thresholds = range(0, 1, 0.01); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not quite right, is it?
For example, if I have two points within a tick (let's say at 0.001 and 0.004) then none of my thresholds covers the case where they are on opposite sides of the decision boundary. This could very well be the best FPR/TPR combination....
So I would order the values first, and then always take thresholds as the middle between two values, e.g.,
values: [1,5,4,2,3]
ordered: [1,2,3,4,5]
thresholds: [-inf, 1.5, 2.5, 3.5, 4.5, inf]
so total of 6 points for ROC, but then you still need to calculate the convex hull of those points if I'm not mistaken. How are you avoiding hitting points below the convex hull btw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps on the wrong track here, but it seems like we might miss something by just looking at 100 thresholds vs all possible thresholds between values (after sorting).
describe("Test calculatePerClassROCData", () => { | ||
it("generate x,y data corresponding to fpr and tpr respectively", () => { | ||
const result = calculatePerClassROCData( | ||
[0.33, 0.32, 0.34, 0.29, 0.12, 0.41, 0.4, 0.39], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add zeros after the comma to get all the points closer together (0.032, 0.033, 0.034, for example), then it will not do it right, will it?
|
||
// TODO: for some reason in the adult census data, classnames doesn't match the true_y data | ||
// need to confirm how this data aligns. temporarily hard code classnames to 0 and 1 for now | ||
const cNames = [0, 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should only run for binary classification, right? So we probably need a check somewhere to disable the component otherwise.
For multiclass one could do one vs all but I don't know if anyone wants that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the studio, there's an auc chart for multiclass so i assumed we'd want that (binarizeData is supposed to handle this case), but i'll discuss with Minsoo tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably 1 vs all, so that makes sense.
@@ -1828,6 +1830,7 @@ | |||
"regressionDistributionPivotItem": "Target distribution", | |||
"metricsVisualizationsPivotItem": "Metrics visualizations", | |||
"confusionMatrixPivotItem": "Confusion matrix", | |||
"AUCPivotItem": "AUC Chart", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but we'll probably need a little explainer somewhere. Especially because it's an acronym
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will discuss with Minsoo!
// aucChartExperienceFlight, | ||
// this.context.featureFlights | ||
// ) && ( | ||
// TODO: support multiclass: IsMulticlass(this.context.modelMetadata.modelType)) && ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you would need a drop-down to select the one vs all class and then generate the 0s and 1s based on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the idea would be to show a line for each class; does that make sense ? the user can optionally disable the line's visibility (see the roc chart here https://ml.azure.com/experiments/id/e01fdc58-9702-4fb8-add9-1a5d09b21058/runs/iris-automl-1_24?wsid=/subscriptions/e0fd569c-e34a-4249-8c24-e8d723c7f054/resourceGroups/hawestra-rg/providers/Microsoft.MachineLearningServices/workspaces/hawestra-ws&tid=72f988bf-86f1-41af-91ab-2d7cd011db47#metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will be tricky if you have one line up per cohort as well. Or are we showing just for current cohort?
...del-assessment/src/lib/ModelAssessmentDashboard/Controls/ModelOverview/getAUCChartOptions.ts
Outdated
Show resolved
Hide resolved
originally, i was looking at 100 thresholds because this is how the metrics package does it but i changed that to look at the y probabilities themselves and go through each of these as the threshold |
Description
first draft to add AUC chart to model overview; still needs PM input
Checklist