Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overwrite default tasks #2487

Open
jonoillar opened this issue Nov 13, 2024 · 0 comments
Open

Overwrite default tasks #2487

jonoillar opened this issue Nov 13, 2024 · 0 comments

Comments

@jonoillar
Copy link

Context

I would like to build an interface on top of lm_eval.

Basically, my goal is to provide different set of custom configurations for tasks.

User story

I have a custom task config file in my repo: custom_repo/tasks

Basically, if a user wants to benchmark a model on task task_name, I would like:

  • If there is a config file corresponding to task_name in custom_repo/tasks, e.g. custom_repo/tasks/task_name.yaml, use it
  • Else, use lm_eval/tasks/task_name.yaml

Problem

Right now, if

  • a custom config file is provided on a task that already exists in lm_eval/tasks
  • include_defaults=True

Then the lm_eval/tasks tasks takes precedence.

Minimal reproducible script:

I'm using

  • python3.10
  • lm-eval[math,ifeval,sentencepiece]==0.4.5
from lm_eval.tasks import TaskManager, get_task_dict
from pathlib import Path

def main():
    task_name = "triviaqa"
    config_path = str(Path(__file__).parent)
    task_manager = TaskManager(include_path=config_path, include_defaults=False)
    task_dict = get_task_dict(task_name, task_manager)
    print(task_dict["triviaqa"])
if __name__ == "__main__":
    main()

With the custom config file:

task: triviaqa
dataset_path: trivia_qa
dataset_name: rc.wikipedia.nocontext
output_type: generate_until
training_split: train
validation_split: validation
description: "Answer these questions:\n\n"
doc_to_text: "Q: {{question}}?\nA:"
doc_to_target: "{{answer.aliases}}"
num_fewshot: 5

I get:

ConfigurableTask(task_name=triviaqa,output_type=generate_until,num_fewshot=None,num_samples=17944)

By just changing include_defaults=False when instanciating the TaskManager, I get printed:

ConfigurableTask(task_name=triviaqa,output_type=generate_until,num_fewshot=5,num_samples=7993)

Which is the custom configuration I set

Could we have the choice on overwriting the config files or not ?

Investigation

I took a look at the code. Basically, the mapping task_name <-> task_config_yaml_file is defined there, in the initialize_tasks function:

https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/__init__.py#L82

    def initialize_tasks(
        self,
        include_path: Optional[Union[str, List]] = None,
        include_defaults: bool = True,
    ):
        """Creates a dictionary of tasks index.

        :param include_path: Union[str, List] = None
            An additional path to be searched for tasks recursively.
            Can provide more than one such path as a list.
        :param include_defaults: bool = True
            If set to false, default tasks (those in lm_eval/tasks/) are not indexed.
        :return
            Dictionary of task names as key and task metadata
        """
        if include_defaults:
            all_paths = [os.path.dirname(os.path.abspath(__file__)) + "/"]
        else:
            all_paths = []
        if include_path is not None:
            if isinstance(include_path, str):
                include_path = [include_path]
            all_paths.extend(include_path)

        task_index = {}
        for task_dir in all_paths:
            tasks = self._get_task_and_group(task_dir)
            task_index = {**tasks, **task_index}

        return task_index

When setting include_defaults=True, then the first element of variable all_paths is the path to lm_eval/tasks.

Then, when creating the task_index variable, we iterate over the directories in all_path.

However, the way task_index is updated is with this piece of code:

            task_index = {**tasks, **task_index}

Mainly, if there is the same key in tasks dict and in task_index dict, then the key in task_index dict takes precedence.

Possible solution

I see 2 possible solutions:

  1. change the line task_index = {**tasks, **task_index} to task_index = {**task_index, **tasks}
  2. Add lm_eval/tasks path to the end of the list all_paths
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant