Skip to content

Cannot find specific implementation of GW-MoE in both train script and the model implementation. #2

@youzark

Description

@youzark

It's said in the paper that experiments were done with both t5 and jet, but I cannot find gw-related implementation in those two models, and the train(eval) script only contain config that's related to GW-MoE, is the relevant code snippet not released yet or Just I didn't find them?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions