Issues in evaluation code.

It looks like the code you've provided "run_eval.py" is not consistent with the benchmark dataset you've provided. I've encountered a few issues and I'd like to know their solutions:
1. args.geobenchmark is set to "npee" but there's no such benchmark file. I've read in another issue that you've asked to replace with "geobenchmark_npee.json". However, I am not sure how the code will run when we'll pass the apstudy.json benchmark as the code is only written for npee.

2. In the following image, the code line "**for the_answer_is in ['wa', 'woa']**" can you please explain what is this 'wa' and 'woa' as it's not mentioned in the code-base or inside the npee dataset anywhere.

3. In the same image, code line "**source = source_target['source'][question_type][the_answer_is]**", if you load npee.json benchmark file as json then source_target['source'] will give keyword error as only 6 keys are available ['noun', 'choice', 'completion', 'tf', 'qa', 'discussion'] so this key seems to be wrong.

4. Moreover, even if you say "**source_target[question_type][the_answer_is]**" is the correct format, still "**the_answer_is**" is a key error as only _['question', 'answer']_ exist in the "**choice**" element of npee file. What's the right format?
![image](https://github.com/davendw49/k2/assets/56957881/2d96a504-011d-4d85-9427-d01596264b72)

5. How do you evaluate and test the apstudy.json benchmark as the code is not written for that?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues in evaluation code. #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issues in evaluation code. #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions