Skip to content

Conversation

tAnGjIa520
Copy link

No description provided.

@puyuan1996 puyuan1996 added research Research work in progress polish Polish algorithms, tests or configs enhancement New feature or request labels Sep 18, 2025
@puyuan1996 puyuan1996 changed the base branch from dev-uz-mt-balance to dev-multitask-balance-clean September 18, 2025 11:38
from tensorboardX import SummaryWriter

from lzero.entry.utils import log_buffer_memory_usage, TemperatureScheduler
# 添加性能监控相关导入
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改一下注释为英文,统一用类似DI-engine/ding/model/common/encoder.py at main · opendilab/DI-engine这样的规范格式

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你只改相对dev-multitask-balance-clean分支的moe grad改动这块哈,dev-multitask-balance-clean分支内部的注释规范我去修改哈

@puyuan1996 puyuan1996 changed the title feature(tj): Add monitoring for the gradient conflict metric of MoE. feature(tj): addd monitoring for the gradient conflict metric of MoE in ScaleZero Sep 18, 2025
@@ -0,0 +1,2207 @@
diff --git a/lzero/entry/train_unizero_multitask_segment_ddp.py b/lzero/entry/train_unizero_multitask_segment_ddp.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除不用的文件


import sys
import os
PROJECT_ROOT = os.path.abspath("/fs-computility/niuyazhe/tangjia/github/LightZero") # 或者直接写死路径
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是必要的吗 如果是写成相对路径,不要带有个人信息

# ------------------------------------------------------------
# MOE专家选择统计相关函数
# ------------------------------------------------------------
def merge_expert_stats_across_ranks(all_expert_stats):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

通用工具函数放在utils中去

_GLOBAL_HEATMAP_FIG.set_size_inches(figsize)
return _GLOBAL_HEATMAP_FIG, _GLOBAL_HEATMAP_AX

def create_heatmap_with_values_fast(matrix, task_ids, title="Task-Expert Selection Frequencies"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

通用工具函数放在utils中去 这里也是

'wasserstein_stats': wasserstein_stats
}

def create_similarity_heatmap_no_diagonal(similarity_matrix, task_ids, metric_name, title_suffix=""):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

工具函数已经放在utils里面了,就把lzero/entry/train_unizero_multitask_segment_ddp.py里面没用的删除了哈

betas=(0.9, 0.95),
)

# self.a=1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除

# "max_moe_layer_grad_conflict",
]

# # # If the model uses MoE, add expert gradient conflict variables
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除不用的注释

import os
PROJECT_ROOT = os.path.abspath("/fs-computility/niuyazhe/tangjia/github/LightZero") # 或者直接写死路径
sys.path.insert(0, PROJECT_ROOT)
# /fs-computility/niuyazhe/tangjia/github/LightZero/zoo/atari/config/atari_unizero_multitask_segment_ddp_config_debug.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除debug config


# 手动销毁进程组
# 手动销毁进程组 /fs-computility/niuyazhe/tangjia/github/LightZero/zoo/atari/config/atari_unizero_multitask_segment_ddp_config_debug.py
if dist.is_initialized():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config中只保留我们scalezero论文里面用到分析的吧

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request polish Polish algorithms, tests or configs research Research work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants