Overwrite default tasks #2487

jonoillar · 2024-11-13T15:06:24Z

Context

I would like to build an interface on top of lm_eval.

Basically, my goal is to provide different set of custom configurations for tasks.

User story

I have a custom task config file in my repo: custom_repo/tasks

Basically, if a user wants to benchmark a model on task task_name, I would like:

If there is a config file corresponding to task_name in custom_repo/tasks, e.g. custom_repo/tasks/task_name.yaml, use it
Else, use lm_eval/tasks/task_name.yaml

Problem

Right now, if

a custom config file is provided on a task that already exists in lm_eval/tasks
include_defaults=True

Then the lm_eval/tasks tasks takes precedence.

Minimal reproducible script:

I'm using

python3.10
lm-eval[math,ifeval,sentencepiece]==0.4.5

from lm_eval.tasks import TaskManager, get_task_dict
from pathlib import Path

def main():
    task_name = "triviaqa"
    config_path = str(Path(__file__).parent)
    task_manager = TaskManager(include_path=config_path, include_defaults=False)
    task_dict = get_task_dict(task_name, task_manager)
    print(task_dict["triviaqa"])
if __name__ == "__main__":
    main()

With the custom config file:

task: triviaqa
dataset_path: trivia_qa
dataset_name: rc.wikipedia.nocontext
output_type: generate_until
training_split: train
validation_split: validation
description: "Answer these questions:\n\n"
doc_to_text: "Q: {{question}}?\nA:"
doc_to_target: "{{answer.aliases}}"
num_fewshot: 5

I get:

ConfigurableTask(task_name=triviaqa,output_type=generate_until,num_fewshot=None,num_samples=17944)

By just changing include_defaults=False when instanciating the TaskManager, I get printed:

ConfigurableTask(task_name=triviaqa,output_type=generate_until,num_fewshot=5,num_samples=7993)

Which is the custom configuration I set

Could we have the choice on overwriting the config files or not ?

Investigation

I took a look at the code. Basically, the mapping task_name <-> task_config_yaml_file is defined there, in the initialize_tasks function:

https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/__init__.py#L82

    def initialize_tasks(
        self,
        include_path: Optional[Union[str, List]] = None,
        include_defaults: bool = True,
    ):
        """Creates a dictionary of tasks index.

        :param include_path: Union[str, List] = None
            An additional path to be searched for tasks recursively.
            Can provide more than one such path as a list.
        :param include_defaults: bool = True
            If set to false, default tasks (those in lm_eval/tasks/) are not indexed.
        :return
            Dictionary of task names as key and task metadata
        """
        if include_defaults:
            all_paths = [os.path.dirname(os.path.abspath(__file__)) + "/"]
        else:
            all_paths = []
        if include_path is not None:
            if isinstance(include_path, str):
                include_path = [include_path]
            all_paths.extend(include_path)

        task_index = {}
        for task_dir in all_paths:
            tasks = self._get_task_and_group(task_dir)
            task_index = {**tasks, **task_index}

        return task_index

When setting include_defaults=True, then the first element of variable all_paths is the path to lm_eval/tasks.

Then, when creating the task_index variable, we iterate over the directories in all_path.

However, the way task_index is updated is with this piece of code:

            task_index = {**tasks, **task_index}

Mainly, if there is the same key in tasks dict and in task_index dict, then the key in task_index dict takes precedence.

Possible solution

I see 2 possible solutions:

change the line task_index = {**tasks, **task_index} to task_index = {**task_index, **tasks}
Add lm_eval/tasks path to the end of the list all_paths

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overwrite default tasks #2487

Overwrite default tasks #2487

jonoillar commented Nov 13, 2024

Overwrite default tasks #2487

Overwrite default tasks #2487

Comments

jonoillar commented Nov 13, 2024

Context

User story

Problem

Investigation

Possible solution