-
Notifications
You must be signed in to change notification settings - Fork 36
Multi Objective CNN benchmark - [WIP] #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ayushi-3536
commented
May 6, 2022
- Added benchmark outline for CNN benchmarks from paper
- Pull from MO interface by Phillip is pending
- changed lock name for mo_cnn bench
- removed hard coded model input to support multiple datasets
This reverts commit f76ea99.
…ation of various datasets - changed epoch from 50 to 25 (from literature) - corrected epoch training(0 indexed) - removed subsample from fidelity(not done in literature, can discuss to add it if we want to perform experiments for this) - returning python object
Codecov Report
@@ Coverage Diff @@
## development #147 +/- ##
===============================================
- Coverage 44.26% 42.23% -2.04%
===============================================
Files 41 46 +5
Lines 2415 2671 +256
===============================================
+ Hits 1069 1128 +59
- Misses 1346 1543 +197
|
- merged fidelity space and choice method
@KEggensperger, could you please have a look at it? |
val_accuracy = model.eval_fn(ds_val, device).item() | ||
eval_valid_runtime = time.time() - start | ||
start = time.time() | ||
test_accuracy = model.eval_fn(ds_test, device).item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same questions as for the other benchmark: Why spending time on computing test metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Changed it to "training time".
The eval time should be almost equal for every run, so i think it is more important to report the "training time" instead of the "total time per configuration".
Thanks for the feedback!
-update test case
Added mo cnn benchmarks from bag of baseline paper We deviate from the original benchmark in two points: * we return as cost only the training time instead of the total elapsed time * we return as objective for minimization instead of `-100 * accuracy` now `1 - accuracy` to achieve better output scalings. Co-authored-by: ayushi-3536 <ayushi-3536@github.com> Co-authored-by: Philipp Müller <muller-phil@gmx.net>