Skip to content

Conversation

ayushi-3536
Copy link

  • Added benchmark outline for CNN benchmarks from paper
  • Pull from MO interface by Phillip is pending

- changed lock name for mo_cnn bench
- removed hard coded model input to support multiple datasets
…ation of various datasets

- changed epoch from 50 to 25 (from literature)
- corrected epoch training(0 indexed)
- removed subsample from fidelity(not done in literature, can discuss to add it if we want to perform experiments for this)
- returning python object
@codecov
Copy link

codecov bot commented May 17, 2022

Codecov Report

Merging #147 (cf7ccfa) into development (9dde397) will decrease coverage by 2.03%.
The diff coverage is 30.18%.

❗ Current head cf7ccfa differs from pull request most recent head 761a7ee. Consider uploading reports for the commit 761a7ee to get more accurate results

Impacted file tree graph

@@               Coverage Diff               @@
##           development     #147      +/-   ##
===============================================
- Coverage        44.26%   42.23%   -2.04%     
===============================================
  Files               41       46       +5     
  Lines             2415     2671     +256     
===============================================
+ Hits              1069     1128      +59     
- Misses            1346     1543     +197     
Impacted Files Coverage Δ
hpobench/util/data_manager.py 47.04% <15.78%> (-10.36%) ⬇️
hpobench/container/benchmarks/mo/cnn_benchmark.py 66.66% <66.66%> (ø)
hpobench/container/client_abstract_benchmark.py 85.64% <0.00%> (-1.39%) ⬇️
...bench/container/benchmarks/surrogates/yahpo_gym.py 100.00% <0.00%> (ø)
hpobench/dependencies/mo/scalar.py 0.00% <0.00%> (ø)
hpobench/dependencies/mo/fairness_metrics.py 0.00% <0.00%> (ø)
...pobench/container/benchmarks/mo/adult_benchmark.py 100.00% <0.00%> (ø)
hpobench/container/benchmarks/nas/nasbench_201.py 36.84% <0.00%> (+36.84%) ⬆️

@PhMueller PhMueller requested a review from KEggensperger May 24, 2022 12:04
@PhMueller
Copy link
Contributor

@KEggensperger, could you please have a look at it?

val_accuracy = model.eval_fn(ds_val, device).item()
eval_valid_runtime = time.time() - start
start = time.time()
test_accuracy = model.eval_fn(ds_test, device).item()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same questions as for the other benchmark: Why spending time on computing test metrics?

Copy link
Contributor

@PhMueller PhMueller May 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Changed it to "training time".

The eval time should be almost equal for every run, so i think it is more important to report the "training time" instead of the "total time per configuration".

Thanks for the feedback!

@PhMueller PhMueller merged commit 4c4f1d9 into automl:development Jun 1, 2022
PhMueller added a commit to PhMueller/HPOBench that referenced this pull request Feb 21, 2023
Added mo cnn benchmarks from bag of baseline paper

We deviate from the original benchmark in two points: 
* we return as cost only the  training time instead of the total elapsed time
* we return as objective for minimization instead of `-100 * accuracy` now `1 - accuracy` to achieve better output scalings. 

Co-authored-by: ayushi-3536 <ayushi-3536@github.com>
Co-authored-by: Philipp Müller <muller-phil@gmx.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants