Validate feature: setting baseline models #266

sasha-scale · 2022-03-24T18:29:41Z

Changes the interface on Scenario Test Creation to not require setting thresholds up front. Instead of passing in a test criteria, a user instead initializes the test with a list of evaluation functions, from which criteria are created with threshold null.

Users can later customize this threshold with metric.set_threshold

This PR also introduces the API interface for setting a model as a baseline for a whole unit test.

Implemented pytest coverage for both.

Shortcut ticket: https://app.shortcut.com/scaleai/story/399936/remove-manual-thresholding-from-api-interface-for-validate

Corresponding scaleapi side PR: https://github.com/scaleapi/scaleapi/pull/37635

Note: build won't pass until scaleapi side is deployed, but we expect pytest to pass when run against those changes locally

pfmark · 2022-03-25T18:34:20Z

Running into
{"status_code":400,"error":"Input schema validation failed: \"evaluationCriteria\" is required When reporting an issue add the request_id:'040ed360-593d-9316-b99a-142b41d3f4c1'"}

when setting up a test through the API with

client.validate.create_scenario_test("new baseline test images", slice_id="slc_c81fxadwzftg238jb5zg", evaluation_functions=[client.validate.eval_functions.bbox_iou()])

I guess this is some Pydantic issue, right?

sasha-scale · 2022-03-25T23:16:50Z

If you have difficulty testing this PR (getting 400 errors), you are probably hitting the production scaleapi server instead of the local feature branch.

you would initialize the client like this: client = nucleus.NucleusClient(os.environ.get("API_KEY"), endpoint='http://localhost:3000/v1/nucleus') where on localhost:3000, you have the other feature branch checked out

To run pytest against the feature branch,

do: export NUCLEUS_ENDPOINT='http://localhost:3000/v1/nucleus in the window where you're running pytest

pfmark · 2022-03-28T18:46:26Z

We also need to update the scenario_test.add_criterion(...) method.

I tried st.add_criterion(client.validate.eval_functions.bbox_map()) which doesn't work.

Let's also rename it to add_eval_function(...) in order to avoid confusion.

sasha-scale · 2022-03-28T19:14:52Z

Renamed add_criterion to add_eval_function

pfmark · 2022-03-28T20:35:45Z

Tested locally and interaction with the backend works!

My bad, forgot this earlier, let's also rename metric.get_criteria() to metric.get_eval_functions() to make it consistent.

pfmark

let's test after the backend deployed, but lgtm after the rename suggestion

add new set model as baseline functions to client, remove add_criteria in favor of add_eval_function, bump version number and changelog

sasha-scale added 3 commits March 23, 2022 18:29

add new functions to client

913a782

change type expected for unit test creation to evaluation function

2e83427

cleanup

a118c1b

sasha-scale requested a review from a team March 25, 2022 23:17

remove add_criteria in favor of add_eval_function

da4a5f8

pfmark self-requested a review March 28, 2022 20:43

pfmark approved these changes Mar 28, 2022

View reviewed changes

sasha-scale added 7 commits March 29, 2022 13:42

fix pylint + rename get_criteria

a08919f

small fixes

ddba21b

fix mypy

cf78f96

fix tests

2a0170a

part 2

94d977a

Merge branch 'master' into validate/set_model_baseline

56786f7

bump version number and changelog

7f8f7ac

sasha-scale merged commit c9f1f59 into master Mar 29, 2022

sasha-scale deleted the validate/set_model_baseline branch March 29, 2022 23:57

gatli pushed a commit that referenced this pull request Mar 30, 2022

Validate feature: setting baseline models (#266)

e6a9058

add new set model as baseline functions to client, remove add_criteria in favor of add_eval_function, bump version number and changelog

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Validate feature: setting baseline models #266

Validate feature: setting baseline models #266

Uh oh!

sasha-scale commented Mar 24, 2022 •

edited

Loading

Uh oh!

pfmark commented Mar 25, 2022 •

edited

Loading

Uh oh!

sasha-scale commented Mar 25, 2022

Uh oh!

pfmark commented Mar 28, 2022 •

edited

Loading

Uh oh!

sasha-scale commented Mar 28, 2022

Uh oh!

pfmark commented Mar 28, 2022

Uh oh!

pfmark left a comment

Uh oh!

Uh oh!

Validate feature: setting baseline models #266

Validate feature: setting baseline models #266

Uh oh!

Conversation

sasha-scale commented Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pfmark commented Mar 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sasha-scale commented Mar 25, 2022

Uh oh!

pfmark commented Mar 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sasha-scale commented Mar 28, 2022

Uh oh!

pfmark commented Mar 28, 2022

Uh oh!

pfmark left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sasha-scale commented Mar 24, 2022 •

edited

Loading

pfmark commented Mar 25, 2022 •

edited

Loading

pfmark commented Mar 28, 2022 •

edited

Loading