It looks like the code you've provided "run_eval.py" is not consistent with the benchmark dataset you've provided. I've encountered a few issues and I'd like to know their solutions:
-
args.geobenchmark is set to "npee" but there's no such benchmark file. I've read in another issue that you've asked to replace with "geobenchmark_npee.json". However, I am not sure how the code will run when we'll pass the apstudy.json benchmark as the code is only written for npee.
-
In the following image, the code line "for the_answer_is in ['wa', 'woa']" can you please explain what is this 'wa' and 'woa' as it's not mentioned in the code-base or inside the npee dataset anywhere.
-
In the same image, code line "source = source_target['source'][question_type][the_answer_is]", if you load npee.json benchmark file as json then source_target['source'] will give keyword error as only 6 keys are available ['noun', 'choice', 'completion', 'tf', 'qa', 'discussion'] so this key seems to be wrong.
-
Moreover, even if you say "source_target[question_type][the_answer_is]" is the correct format, still "the_answer_is" is a key error as only ['question', 'answer'] exist in the "choice" element of npee file. What's the right format?

-
How do you evaluate and test the apstudy.json benchmark as the code is not written for that?
It looks like the code you've provided "run_eval.py" is not consistent with the benchmark dataset you've provided. I've encountered a few issues and I'd like to know their solutions:
args.geobenchmark is set to "npee" but there's no such benchmark file. I've read in another issue that you've asked to replace with "geobenchmark_npee.json". However, I am not sure how the code will run when we'll pass the apstudy.json benchmark as the code is only written for npee.
In the following image, the code line "for the_answer_is in ['wa', 'woa']" can you please explain what is this 'wa' and 'woa' as it's not mentioned in the code-base or inside the npee dataset anywhere.
In the same image, code line "source = source_target['source'][question_type][the_answer_is]", if you load npee.json benchmark file as json then source_target['source'] will give keyword error as only 6 keys are available ['noun', 'choice', 'completion', 'tf', 'qa', 'discussion'] so this key seems to be wrong.
Moreover, even if you say "source_target[question_type][the_answer_is]" is the correct format, still "the_answer_is" is a key error as only ['question', 'answer'] exist in the "choice" element of npee file. What's the right format?

How do you evaluate and test the apstudy.json benchmark as the code is not written for that?