You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
def initialize_tasks(
self,
include_path: Optional[Union[str, List]] = None,
include_defaults: bool = True,
):
"""Creates a dictionary of tasks index.
:param include_path: Union[str, List] = None
An additional path to be searched for tasks recursively.
Can provide more than one such path as a list.
:param include_defaults: bool = True
If set to false, default tasks (those in lm_eval/tasks/) are not indexed.
:return
Dictionary of task names as key and task metadata
"""
if include_defaults:
all_paths = [os.path.dirname(os.path.abspath(__file__)) + "/"]
else:
all_paths = []
if include_path is not None:
if isinstance(include_path, str):
include_path = [include_path]
all_paths.extend(include_path)
task_index = {}
for task_dir in all_paths:
tasks = self._get_task_and_group(task_dir)
task_index = {**tasks, **task_index}
return task_index
When setting include_defaults=True, then the first element of variable all_paths is the path to lm_eval/tasks.
Then, when creating the task_index variable, we iterate over the directories in all_path.
However, the way task_index is updated is with this piece of code:
task_index = {**tasks, **task_index}
Mainly, if there is the same key in tasks dict and in task_index dict, then the key in task_index dict takes precedence.
Possible solution
I see 2 possible solutions:
change the line task_index = {**tasks, **task_index} to task_index = {**task_index, **tasks}
Add lm_eval/tasks path to the end of the list all_paths
The text was updated successfully, but these errors were encountered:
Context
I would like to build an interface on top of lm_eval.
Basically, my goal is to provide different set of custom configurations for tasks.
User story
I have a custom task config file in my repo:
custom_repo/tasks
Basically, if a user wants to benchmark a model on task
task_name
, I would like:task_name
incustom_repo/tasks
, e.g.custom_repo/tasks/task_name.yaml
, use itlm_eval/tasks/task_name.yaml
Problem
Right now, if
lm_eval/tasks
include_defaults=True
Then the
lm_eval/tasks
tasks takes precedence.Minimal reproducible script:
I'm using
With the custom config file:
I get:
By just changing
include_defaults=False
when instanciating theTaskManager
, I get printed:Which is the custom configuration I set
Could we have the choice on overwriting the config files or not ?
Investigation
I took a look at the code. Basically, the mapping
task_name
<->task_config_yaml_file
is defined there, in theinitialize_tasks
function:https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/__init__.py#L82
When setting
include_defaults=True
, then the first element of variableall_paths
is the path tolm_eval/tasks
.Then, when creating the
task_index
variable, we iterate over the directories inall_path
.However, the way
task_index
is updated is with this piece of code:Mainly, if there is the same key in
tasks
dict and intask_index
dict, then the key intask_index
dict takes precedence.Possible solution
I see 2 possible solutions:
task_index = {**tasks, **task_index}
totask_index = {**task_index, **tasks}
lm_eval/tasks
path to the end of the listall_paths
The text was updated successfully, but these errors were encountered: